Bug hunt on the refactoring project

The refactoring project rolls on. Mostly it’s been more of the same so I haven’t bothered boring my reader with the details. This week we had an interesting bug to fix. The bug had appeared in a much earlier version, way back in July, but it had only been reported by one user and we could never duplicate the problem. This week we managed to duplicate it, and then we needed to work out what it was and when it was added to the source…

The bug was a hard one to fix, mainly because it was a hard one to understand. The refactoring project is an OCX and runs inside IE. This wouldn’t have been my first choice for architecture but that’s what we have to live with. The bug manifested itself as a complete lock up of the OCX when a user pressed the escape key. The OCX wasn’t dead and no exceptions occurred, it just didn’t get any more messages to process…

It never used to do that… But we couldn’t work out when the problem was introduced. I looked at our release history in CVS and started a binary search of previous releases to try and locate the point where the problem started.

We have a ‘one click’ release procedure. So building the previous versions was just a case of release Release_5000 and wait for the correct version of the source to be pulled from CVS and the build to complete. Whilst working through the numerous releases since I instigated the “it would be nice if we could know what we actually have in production” policy I noticed that the older releases seemed to take an age to build…

I adjusted the build script so that it reported the amount of time taken by the build and discovered that the refactoring so far has shaved 4 minutes off of the build. Release 5000 takes 13 mins to build and release 8017 (just don’t ask about the version number ‘policy’) takes 9 mins. Most excellent!

I eventually located the version where the bug was introduced and fixed it…