Death by debug trace

In Asserts Redux Dan Dunham and Scott Shumaker are discussing how sometimes testers have to be able to work around assertion failures and how allowing them to do so dilutes the power of the assertions. The discussion moves on to how debug traces can get out of hand and eventually you drown in debug spew.

I find that in almost all cases debug tracing is overused. The problem is, as Dan succinctly puts it: “When you’re writing the code, you put in a bunch of debug statements to verify things as you go. Over time, these debug statements lose a lot of their meaning”. So why don’t you just take them out at that point? If the code adds no value then remove it. It’s become clutter. It makes both the code that it’s part of and the log that it writes to harder to read and appears more complex than they really are.

If you put in some debug to help track down a bug you can remove it when you fix the bug. The removal is part of the cleanup that can occur after the fix. If the tracing was simple “I’m here, variable a = %d” type stuff then it’s easy to put back in if you have a similar bug to find later. If it’s more complex, like dumping the state of an object or data structure then it should be a method that can be called from where you want to dump the information, and the method can remain but the debug statement can go. Failure to remove this kind of stuff as you go builds up a resistance for this kind of change later in the project. It’s harder to justify making people remove this useless code when the codebase already contains so much of it.

Strangely most people try and solve the problem of having too much trace information not by removing the unnecessary rubbish from the trace logs but by doing almost everything else. You get complex log file monitors that let you filter out things that you’re not interested in; you get multiple logs or output devices; you get log levels and priorities and log types so that each line of spew can be categorised in a special way. It never works. The requirements of the person viewing the trace are almost always different to the requirements of the person who inserted the trace in the first place.

The complexity of these tracing mechanism always grows over time. You add trace levels so that you can only view the “important stuff”. Then you compose a system out of a few libraries and find that in one of them the “important stuff” is peppered with stuff that you don’t consider to be important. You add trace categories so that each library can be filtered on its own. Great, except now you’ve added coupling between every library you write and the thing that manages trace categories; or you haven’t and occasionally categories overlap and you get unexpected messages from the library that you’re developing… You find that in a multi-threaded situation things get complex. You make the trace code more complex so that you don’t intermingle trace lines from different threads. This may create a synchronization point that otherwise wouldn’t be there… You switch to using a third party logging system, log4j perhaps and now you have another dependency to manage.

Obviously sometimes you need some form of log for post-mortem purposes. You may have a transaction processing system and operations staff might need to find out why it declined to authorise a transaction at 23:10 last night. This isn’t the same thing, though most programmers seem to think it is and most operation staff have got used to having to work with logs that are full of rubbish and that never help them solve the problem. If you need a post-mortem log keep it separate to any debug log you have. Make sure that every line that goes into the post-mortem log helps the user of the log to do their job. Temporary debug traces should go somewhere else and they should be removed when no longer required.