Cypher

2004-03-13

I’m currently working with a corporate client. The plan is that I’ll help them refactor a key component in their system and make it more robust and increase performance. Right now they’re in the middle of a release and are in ‘slip mode’. I find myself feeling some sympathy for Cypher from The Matrix; there I was, safe in my green-tinted world of TDD, then suddenly I find myself in “The real world” and it’s nasty and messy and there aren’t any tests. Now I wish I could get back to where I was and I don’t care if it isn’t “real”…

Of course that’s why my client needs me. They’ve read the blog, they know about testing, they want some. The component I’ll be working on can usher in a new way; a way where coding without tests is frowned upon and where releases happen when you expect because you know exactly how bad things are at all times… ;) The success of this component will cause the practice to spread, like a virus, to the rest of the team and then onwards into the entire organisation. ;) If we’re lucky…

In order to understand the bits I’ll be working with I’ve been helping the team fire fight the current release. They’ve just completed a huge refactoring where they stripped out masses of redundancy and unrequired generic behavior and replaced large chunks of VB with C++ for performance reasons. In general the system is in good shape, but it leaks; memory, handles, you name it, it leaks it. Given that the system has to run for the entire business day, and that interruptions in service would be bad, the leaks are a problem.

Like all good corporate systems this one is built on the work of other teams within the organisation. We get our widgets from the widget team’s widget server component; we use the super secret calculation libraries by way of the super fast, distributed (like God intended) calculation system; we report our status via the official, one true way, status reporting component, etc.

Like most corporate systems the quality of the code that’s being reused is not always as high as you’d like; and like most corporate environments, convincing the team that provided the failing component that it’s their problem and that they need to fix it in relation to your deadline is often complex. Most of the issues that the team has found this week have been in other teams’ code.

In “the real world” testing takes a long time. You can’t just run the unit tests for the part of the system that you think is leaking and use your leak detection tools because you don’t have unit tests and you can’t run one part of your system without running all of it. Running all of it with your leak detection tools can be a time consuming job, detecting leaks slows the system down as the instrumented code runs. The test, fix, retest cycle is long and slow and frustrating… In a way it’s an architecture thing, but the architecture wouldn’t be like this if the system had been developed test first as it wouldn’t be as tightly coupled. Even breaking out key pieces and writing test harnesses is non trivial because, due to unforeseen coupling, you always need to test more than you actually want to test… Much of the testing that can be done is done by hand, we’re finding leaks but we’re not finding leaks in a way that can be run as part of the build. It’s a manual process and one that would be hard to automate.

Going forward we need to isolate the new components that the team owns from the code that they’re reusing; at present, thanks to the magic of COM, these reusable components are tightly coupled into the fabric of the system. Once that’s done the team could test their code in isolation, which is always useful ammunition in the “it’s not our problem” wars.