Analyzing python tools : usability/activity/statistic/logging

hello,

Does some of y’all have some good techniques and idea around scanning ‘user activity’ in your tools?

I am interested to learn from how artists use my tools.
To see if they use the full potential of the tool, to see where I made mistakes in usability.

What are good things to track in tools and how to track them ?
The tools I am writing mainly run on python and Qt.

Is this things you write from scratch or are there existing modules that are great to help ?
Are there any good resources online ?

Also would like to know more about :
What to log ?

  • features that are used
  • the widgets that receive signals
  • crashes

How to log it ?

  • dynamic json flie
  • one log file that becomes bigger
  • seperated file per user, text file

How to get something the log ?

  • good way to visualize these results

If you have more idea’s about this ? or some experience with these kind of systems ?

thanks !

Adam Plecther has the grandaddy of all logging systems, I’m sure he’ll have a lot to share.

I’ve done two separate paths:

  1. logging data to a SQL server has worked out really well, it automatically aggregates the data and makes it easy to do things like run statistics or look for trends over times. Way better than log files. The main drawback is the occasional hang if the server times out due to internet issues. To the extent you can, it’s nice to have a separate thread running that does the actual server communication so the user doesn’t feel any hitches.

  2. Email is better than logging for bug reports, especially if you include good context information (like a whole stack trace) in the email. That way you can respond when the problem occurs and go talk to the user while they are still dealing with the problem and can help you debug. Plus, users like to know you care.

We do some per-tool logging to files, mainly for local convenience. However, our main studio method of tracking tools telemetry is with a central database. It goes something like this…

I wrote a new handler for the “logging” module in the Python standard library, that emits logging events to our MS SQL database. That includes info, warning, errors and everything else. At any time we can query these records for events we care about, measure time of various processes in our tools, and track errors. That data can be tossed into graphs or reports, whatever.

I also added a second handler that emits events that are error level and above to a HTML-formatted email. Those are sent instantly from the user’s machine to a list monitored by our TAs and other tool developers. The messages include the callstack, build number, other diagnostic data, even one-click links to hop us to relevant places in the code. This error info is also in the database of course, but having it instantly go out in email is a very nice convenience.

This logger gets auto-configured a bit differently based on user location. If they are not on the main Volition network it skips the database handler, logging to a file instead, but typically still does the email emits. This is still very useful for keeping an eye on things at outsourcing studios.

The database handler I wrote uses SQLAlchemy on the backend, which I’m still a big fan of after all these years. Semi-related, I gave a talk at the TA Bootcamp at GDC 2012 about databases and their use in TA applications. Slides are here:

The audio recording of that Bootcamp is available here, if you have GDC Vault access:
https://store.cmpgame.com/product/6205/Technical-Artist-Boot-Camp

As for what to log, that depends entirely on what you want logging to tell you. I would not recommend trying to log “everything” and assume you can draw useful conclusions from that data later. Have questions in mind that you’d like to answer before you start logging. Your data will be much more useful as a result.

Same goes for the data-analysis method. How/where you log things should be based on how you want to look at the data. Maybe a central DB is a good fit, but that might not be feasible from an IT standpoint, and a network folder full of per-user log files might be a better fit for you.

Here is a shot of one of our typical error emails.

Also, I forgot to mention that most of our Python tools that use this register an excepthook handler that gets fired any time an unhandled exception occurs. Meaning, any exception that’s not caught by a try/except. That handler simply logs the error, then raises the exception again. This ensures that all errors get logged, not just the ones the programmer expected might happen and bothered to manually make a log.error() call for.

Here is a shot of one of our typical error emails.

Also, I forgot to mention that most of our Python tools that use this register an excepthook handler that gets fired any time an unhandled exception occurs. Meaning, any exception that’s not caught by a try/except. That handler simply logs the error, then raises the exception again. This ensures that all errors get logged, not just the ones the programmer expected might happen and bothered to manually make a log.error() call for.

i need to add the log to db thing, what tools do you use to look at it with?

Also, we hijacked most of the django error template. It gives you very detailed info on the whole call stack and looks nice:

I use a few things, depending on what I’m after. I have a couple basic Python scripts that just fire some queries at the DB, with filters, via SQLAlchemy ORM and print the results or throw them into a graph. I also sometimes use SQL Server Management Studio for MS SQL, or MySQL Workbench for MySQL. I’ve also used Tableau for a few things, which is very powerful but not great for casual use.

Some very interesting info in this thread! Thanks for sharing everyone. Out of interest: Do you hook into that from non-python tools (mxs, native for example) too? And if so how?

Cheers,
Thorsten

Yes, from MaxScript we can call out to our Python logger. Our log.py module can also be run like a script from the command-line, and our MaxScript logger stub just passes the error details to it as arguments. The stack/exception info isn’t nearly as great as Python’s, but it’s a lot better than nothing.

You could probably have MaxScript/dotnet call out to databases and/or emailers directly but I didn’t see the point in implementing the same stuff in two places.

Good points, and saving myself from re-implementing this in different languages is what i was hoping for. Piping into a cmdline py file should be reasonable from pretty much every language. Great stuff. I also love the idea of (ab)using django’s debug output as it is pretty and extensive. I just stumbled over cgitb in the standard lib. That’s a very quick way to get started for the mails.

I am thinking that i would want this to be as easy to integrate as possible. For error mails possibly as simple as importing a module settings things up. For logging the loghandler stuff is pretty much predefined. For benchmarking and timing i wonder if it would be worth coming up with a simple module that allows something like bench.start(“nameofseries”) and bench.stop(“nameofseries”) to add timing info easily. How do you handle that?

Cheers everyone,
Thorsten

Our timing strategies vary a bit. For simple “record how long something took, like starting up this tool, etc.” we just use time.time( ), then another when it’s done and do a log.info() call to record the duration. We also have a “debug timing” class that works a couple different ways… First as a context manager:

with Debug_Timing( 'load lots of data' ):
   load_data( )
   <more code here>

… which lets us pinpoint specific blocks of code. It automatically notes the timestamp when the block starts, when it ends and produces the difference/duration. It also works as a decorator, for easy timing of entire functions:

@Debug_Timing( 'my long process' )
def start_long_process( ):
  <code>

By default that class just spits the timing results to stdout, but it would be easy to do something similar that logs somewhere permanently.

Thanks again for the info. While looking around a little how debug pages and alike look like in different frameworks i stumbled over sentry. Jeez. That looks lovely and is opensource. In short based on a presentation i just red:

  • Get a high level error message with a bit of stracktrace
  • Go to the webinterface for all the gritty details.
  • Pretty. :wink:

Anyone using that?

Cheers,
Thorsten

Forgot the link to the presentation: http://iromli.github.io/tracing-err-talk/#/

So i tried that myself today and it seems to be really great. After a pretty easy setup in a virtual machine i got up and running quickly.
It comes with predefined handlers for use with the standard lib logging module. So by simply changing the handler of our nuke toolset
i got log.warning and log.error into sentry. After setting excepthook i did also recieve all unhandled exceptions. Complete with stacktrace
and locals. You can also inject custom data as you see need fit (i will add environment variables for example).

It automatically groups incident which seems to work pretty damn nice. And you only get a mail for the first occurance. So far i am in love.
Now i just need to find (or roll my own) something to do benchmarking and timing in a similarly hasslefree, convenient and pretty way :slight_smile:

Cheers,
Thorsten

ok i gave this a shot. First on windows just to see and that failed miserably. I got past a few dependency issues, but it ultimately depends on a posix os. So i put it on a webserver alongside some other stuff and it went a lot smoother. It was really easy to set up and get logs.

I did have a question, how are you setting up excepthook in maya?


import sys

def func(x, y, z):
    logger.error(y)
    sys.__excepthook__(x, y, z)

sys.excepthook = func

This doesnt seem to do anything. Did i miss something?

you need to replace maya’s guiexcepthook:



import maya.utils as utils

_MAYA_EXCEPT_HOOK = utils.formatGuiException

def logger():
    return logging.getLogger('bootstrap')

def set_verbose():
    logger().setLevel(logging.INFO)

def enable_custom_handler():
    utils.formatGuiException = startup_exception_hook
    logger().info('startup exception handler')

def disable_custom_handler():
    utils.formatGuiException = _MAYA_EXCEPT_HOOK
    logger().info('default exception handler')

def startup_exception_hook(etype, value, tb, detail=2):
    results = ["=" * 80] + traceback.format_exception(etype, value, tb, detail)
    logger().critical("
%s" % "
".join(results))
    return _MAYA_EXCEPT_HOOK(etype, value, tb, detail)

Heya,

so the sentry setup is requested and for now i am switching focus to timing. As timing is going to happen in productive environments it needs to be as unobtrusive as possible. Mainly it should not slow down the tool reporting timings. I am aware that this will of course always introduce some slowdown and the more granular the level, the bigger the impact of the measurement. I’d still like to reduce this as much as possible. What i have done now is as follows:

  • create a small flask/flask-peewee application to hold the data (for now data is just a timestamp and a duration as int)
  • add a rest api to allow adding via http requests

Now I wonder what would be best to get data in. On the client side do i want to fire of subthreads? Does anyone have any experience if i should better use peewee directly to write to the db rather than REST calls?

Cheers,
Thorsten

Using a worker thread in maya works well as long as it doesn’t have to come back to maya. I have a single worker thread the runs and then i have a module global that contains a Queue that the worker gets data from. Then you just dump things in the queue and they’ll get handled.

As for doing the db work directly or through a web server, you may just want to do it directly. It removes one more moving part. I have a decorator that will time functions and send the result directly to mongo. You just have to be extra careful if the db goes down for whatever reason, the rest of the tools dont follow. I suppose that would be an argument for a web request though. Either way you’ll need to handle it.

Good points, thanks a lot. I am not to concerned about going back to the main thread as i don’t care at all if some timings are lost. I am aiming for averages and fuzzy information rather than exact amounts. I would say unobtrusive addition to the tool over making sure everything gets logged properly.

Cheers,
Thorsten

Logging directly to the database every time was a bit too slow for my taste, so I put in a local cache for non-error events. It just pickles each event object and writes it to a local file. When the logging object is deleted (typically when that given tool shuts down) it then posts those to the DB in one transaction. Works well.