Insufficient coverage

Today the FX engine went into UAT. Well, the nearest thing we have to UAT; a user looked at it… 3 bugs, differences between the new code and the current production version. All slipped through our test coverage. :(

I had actually expected more problems. Two of these new bugs look like they’re related, which is good. The third is an easy fix which could be viewed as an unwanted feature… What I found most disturbing about the two bugs was being shaken out of the safe, controlled, world of test first refactoring into bug fixing a piece of code that had insufficient test coverage.

The past couple of days have been quite pleasant. I’ve been working down the list of issues that I’d drawn up whilst testing and fixing lots of niggly things that were existing bugs in the production version of the code. This was being done in such a way that it was slotted into the general monkey testing that I was doing with the GUI. Since these items weren’t high priority I had no problem in allocating enough time to each to write tests first. It was nice and controlled and relaxing. I could see the code getting cleaner and was confident my changes didn’t break things because I wrote tests to prove it before I made the changes.

When I started looking at the new bugs this evening I jumped into the test harnesses to try and work out how they’d slipped through. One of the issues was a bug that was very closely related to a previous bug that I knew we had a test for. I was concerned that the test I had was wrong. I went through a whole load of agonising “a wrong test is worse than no test” thoughts and then found that it wasn’t wrong, though it could have been tighter. Both production and the new version of the code gave the same results for the test case that we had, and that passed.

So, we had another edge case to deal with. I started to write a test but it took a lot longer than the tests I’d been working with recently. The FX tests test a lot of functionality at the narrow points where the functionality is exposed to the rest of the application. Testing at the inflection points like this is useful whilst trying to make a legacy code base testable, but it’s nowhere near as nice as having decent test coverage.

Choosing where you get the biggest bang for your buck when writing tests for legacy code is hard. Up until now we’d done pretty well having the majority of the tests at the point where the results of the live data ticks tick through to update the screen display. These tests supported us well whilst we refactored the layers of underlying code; and we added several additional test harnesses as we went, but they’re hard to set up compared to the test we actually needed; a test that tested the rates calculation code in isolation given x input values.

So, the plan for next week is to finish writing the test for the new bugs at the same level that we have the current tests. Fix the bug and get the code into UAT again and then try to find time to make the area where the fault was more locally testable. Given we have the tests at the inflection point we should be able to refactor reasonably quickly to make the calculation code testable in isolation…