Looking back from near the end

2004-11-16

My current consulting gig is coming to an end. We’ve been in the hand over phase for a while now and I think it’s slowly starting to work. Now seems like a good time to look back at what worked and what didn’t…

This project was heavily focused on testability, from the outset and that’s paid off in the later stages of the project. As we near final integration testing lots of last minute requirements keep popping up; some were mentioned earlier in the project and just got missed out and some are completely new. The tests we have mean that the system is surprisingly malleable. Now that we’re in hand over, we have more people working on the code and the tests are supporting our changes well. I certainly haven’t felt the usual fear of last minute changes; in fact, I almost welcome them - bring ’em on, put our tests to the test. We usually know pretty quickly when we break things and some quite sweeping changes have been made in a safe and controlled manner.

Due to the loosely coupled nature of the resulting code (you can’t easily test tightly coupled code, so the code doesn’t become tightly coupled) we can run small pieces of the code-base through leak checking and perf tools as and when we need it. What’s more the interface based design is flexible and easy to change.

Our mock objects have evolved nicely. We have a lot of test code and due to the interface based design a lot of the code to support the tests is in the form of mock objects. These tend to have taken on a standard form; they implement the interface, do stuff, and write information to a text based log. The new twist on this project is that due to the size of the logs that they produce we have taken to storing the ’expected’ logs on disk as separate files. During the test runs the mocks can compare the current log with the expected file on disk and complain if they don’t match. After a compare the mock resets its internal state so that we can compare again to a new log file later on in the test. This works really well and makes the test code itself more compact.

Using coverage tools in association with your test harnesses is great for spotting where you don’t have enough tests that force failure conditions. Whilst striving for 100% coverage isn’t necessarily a valid choice, it’s useful to spot the obvious omissions.

Documenting one or two of your high level unit tests as a code walk through; complete with “set a break point here” and “single step into this bit and you’ll see this happen” type stuff seems to work really well…

So, that’s the stuff that worked, what about the mistakes…

My first mistake was a lack of version labels in the source code tree. We should have treated the code freeze that I did before I wrote the first draft of the documentation as an external release and tagged all the code and then listed the tag in the document. Unfortunately we now have a document that needs some updating and no way to reproduce the actual code tree from when the document was accurate. This proved to be slightly embarrassing when bringing more people onto the project; yes we have a doc, no, it’s not up to date… When things rely on a particular version of the code, you should make sure you can reproduce that version. This problem will go away once the document update phase is complete…

Another mistake was putting too much code in the ’exe project’; we have a collection of libraries, the libraries all have test harnesses and associated mocks. The ’exe projects’ pull the code in the libraries into something that has a main and does real work. I allowed too much code to seep into some of the ’exe projects’, mainly because, at the time, it seemed like simple composition code that didn’t really need to be tested; it does but it’s much harder to do so when it’s not in a library. Almost all of code I’m least proud of is in one or other of the ’exe projects’. I’ve started retro fitting unit test harnesses into the system for these areas and once the tests are there the hacky nature of the code can vanish as it’s refactored.

Singletons are always a bad idea. Always. Every time. They make testing harder; they tie you to a fixed implementation of what should be an interface. We have two singletons, only one of which was a true “GetObject() from anywhere” singleton; that was only used from code in the ’exe projects’ and now it’s been adjusted to be an object that’s passed in via an interface so that we can test to make sure that we get the correct event log messages when the objects under test get put into particular situations. The other singleton was, in effect, just a globally accessible pluggable object, the debug trace log, and that was easy to mock out as it had been designed with that in mind.

Time is always a problem. Not that having tests makes things take longer, far from it; the ‘feel’ I’ve got from this project is that it’s moved along at a very controlled pace and that we’ve moved faster than if we hadn’t had tests. It’s the code’s use of time that can be hard to manage; it always makes code harder to test. As I showed in my Practical Testing series, the best way around the problem of controlling time from within a test is to treat calls to system time providing services as calls to a service provider that can be explicitly provided. During tests we can provide the class with a mock time provider that we can control and in the real world we can allow the class to use a default time provider that simply uses the system’s standard time functions. Being able to control time during tests is surprisingly important. It means that you can guarantee how a timestamp in a log message will look, when, exactly, an operation that’s waiting for x seconds will actually happen, what happens if your time provider wraps around, or goes backwards, or runs very slowly, etc…

When you have a very loosely coupled system and lots of tests for it you end up with lots of code. This can be intimidating to people just starting to work with the system. I think that the way we’ve kept the tests and mocks in separate projects from the production code and how we have separate workspaces to build production code and test harness works. I expect it could be better, but it seems to work and people seem to understand it. Of course that still doesn’t get you away from the ‘problem’ of having lots of small classes that do one thing and that are composed into larger units to do more meaningful work. I think of it as the opposite to the 4000 line function that takes lots of optional parameters and flags, or a single XML blob or void *, and can be called numerous ways to do numerous unrelated things… Sure the 4000 line function looks, superficially, simple because it’s called DoIt() and it just takes a single blob of XML as input andoutput; whereas we have 10 classes and only one deals in terms of XML and the rest deal in specific data types… I’ve said before that I prefer my complexity to be obvious, and I do, but I’m still not entirely convinced that I can’t do better. Part of the solution is to compose the small, simple, objects into more meaningful lumps of code; plug it together like lego so that you’re working with larger components. The problem then is working out when to compose and keeping the larger components testable. We did pretty well with the testability; our main top level object allows us to plug in all of the external data providers (things that need to talk to other systems for data that might change) so we can still run 90% of the system in a unit test. We could probably have done better with the lower level composition, but refactoring is helping us here.

In general I’m pretty pleased with the project so far. The one disappointment is that we have yet to put the whole system through its paces in a full integration test. I like to get stuff end to end as soon as possible and this project has failed in that respect due to the complexity of the system that we are integrating with and lack of available bodies to set up the system and do the testing. We’ve tested our direct integration with the next component in the system but I’d like to have seen everything working together much earlier. It’s no good having unit tests if the code that’s working correctly is solving the wrong problem…