Reducing what the code could do to just what the code should do

2004-08-03

I’ve got one of those ‘please dig us out of this hole before we ship on Friday’ gigs this week. I’m back with the refactoring project for a few days and, well, when the cat’s away…

So, we’re 80% done but none of it seems to work, no tests, and the design seems a tad overly complex and it just must go live by Friday… Our task is simple, just clean things up, make it work right and finish the other 80%.

The code is the usual mess of highly coupled, overly complex, unrestricted, unplanned, half understood, thought droppings; at least there’s no XML…

It seems that whenever I find myself in this situation the solution always ends up as being one where I systematically reduce what the code could do until it’s doing just what it should do. Step one is usually removing unrequired optionality; replacing pointers that can’t be null with references and removing any, now redundant, checking code. As soon as see a function that takes a pointer and immediately checks it for null you know you can remove the 1 or 0 optionality of a pointer and replace it with the certainty of a reference. Like retrofitting const correctness; once you do it in one place, you can chase back up the call stack until you find the point where the value really is optional. Things like this are simple to do but they vitally important in keeping code under control and understandable. By reducing the amount of things you have to question when you read a piece of code you can make the code easier to understand; no need to worry if this pointer can be null if you’re given a reference that can’t ever be.

I personally find it strange how some people write code that’s loose and floppy by default; most things are public, most things are pointers but can never be null and are only sometimes checked, most things aren’t correctly const, too much inheritance, lots of half thought out, ‘what if it does this’ functionality that’s never used, completed or tested… Masses of undecipherable crap that can usually be boiled down to around a third of its original surface area by applying simple transformations that remove all the ‘what ifs’ that come from a lack of understanding. If you dont understand what the code should do, don’t write code that might do what you might need, think it through and write the right code.

I much prefer reading code that is explicit about what it does. No ambiguity. No ’this is for when we do X in three versions time’ stuff. This ‘vapour design’ is usually just a hand waving that’s unlikely to work and is just there so that someone can tick a box somewhere and other people can get really surprised when implementing X in three versions time takes far longer than everyone says it should… If an object can’t be modified then it’s const; if a pointer can never be null, then it’s a reference; class data is private as are functions that a class uses to help it do its job but that are never called by clients… I don’t want to look at code and wonder, I want to look at code and know.

So, once we’ve dealt with the low hanging fruit we can start on fixing the ‘design’. Some people love inheritance so much that they just can’t get enough; I used doxygen on the code to get it to draw some class and collaboration diagrams for me so I could start clicking through the mess in search of understanding. Lots of long thin class hierarchies that distribute functionality at random amongst the classes. It’s interesting, in a way, it’s like each day they’d come in and write some code to solve some of the problem but each day they’d derive from yesterday’s work…

Not surprisingly there’s lots of duplication. What is it with programmers and duplication? As usual, not all of it is cut and paste, that’s easy to spot and easier to deal with, some of it is ‘I don’t remember how I implemented this last time, I don’t even remember I have implemented this before, let’s try it this way’; this is harder to spot, you need to change code until both areas look similar and only then do you get a chance to see the duplication. What’s the deal with duplication and why do coders do it?

Luckily other transformations can help you spot the non obvious duplication. For example; if all loops look different because someone’s working through ‘my first book of looping constructs’ and trying each one no more than once, you can start to remove noise by refactoring loops into a standard form. There really aren’t that many ways you need to loop and if all your loops that do the same thing look the same you can spot the duplication and pull the code into one place. If you need to do several things before you can do what you actually want to do; such as access data in the ‘one true data structure’, calculate indices and then call one of the many methods of doing the same thing you can remove complexity by always doing these things in the same way and in the same order…

Code with no, or many, coding standards always benefits from having a standard applied, pick any one you like, it doesn’t matter - no, really, it doesn’t matter. Step one is to make code that does the same thing look the same, step two can be making it look like your version of ’nice’. If code that does the same thing looks the same, same block structure, same bracket rules, same names, etc, it’s easy to spot duplication and once you spot it you can remove it.

We’re one day in, we have 3 more to go, we’re making some headway, but at present it’s just low hanging fruit, I wonder if we’ll make it…