Living with continuous integration

Well, it’s about a month since I started running Cruise Control .Net and things have settled down somewhat now and I can almost go a day or two without tweaking my configuration or being tempted to fix issues in Cruise Control itself.

For those of you that haven’t been following along:

  • First I realised that the latest (1.3) release of Cruise Control .Net wouldn’t work for me without some hacking.

  • Then I found that my idea of ‘integration’ wasn’t quite the same as the simplistic situation for which Cruise Control .Net worked best; I have lots of projects and running CC.Net with lots of project triggers wasn’t fun.

  • Later I added even more projects to the system so that I could build all of my servers and all of the library code that they depend on for all of the compilers and architectures that I support.

  • I hacked at some ’low hanging fruit’ and seriously questioned the design of CC.Net

  • I finally identified one of the main causes of the lack of scalability that I was seeing.

  • I submitted some of my patches to CC.Net.

And now I’m just ‘using it’, and still complaining. Here are some random thoughts about my experiences so far.

My latest project trigger hacks have been working well. In the end the main scalability issue was the fact that the project trigger would retrieve ALL projects from a server and then search for the ONE project that it actually wanted. It did this using .Net remoting and it did this often. My changes mean that a project trigger now only requests the ONE project that it wants from a server (which means less work on the server that it is requesting from and less work in the trigger itself). You can also configure the trigger to be ’local’ in which case it does all of the query work against internally without using .Net remoting. These changes made it possible to run with lots of projects and the hack around the project integrator and the Sleep() were taken out as the whole ’thread per project’ design was far more complex than it first appeared. I have the project changes ready to submit as a patch but I’m waiting to get some form of response to my previous patches before doing so (if anyone wants the patches now, just ask).

My aim of being able to rebuild all of my projects for all of my supported compilers hit another snag when I found that the box that I was using to build VC6, VS2002, VS2003 and VS2005 didn’t have enough disk space to build everything at once. There’s an easy fix for that but for now I’ve cut down the number of compilers that it builds for. Likewise my plan to have the system set up to build multiple branches suffers from the same problem. The VS2008 and VS2005, x64 and x86 build setup on my main development box works nicely.

The whole thing runs too slowly for my liking. This is related to the design of CC.Net and isn’t something I intend to try and change. I run with two build queues on each build machine; build and deploy projects are queued in one queue and test projects are queued in another. This is mainly because tests must run sequentially as sometimes (in the case of server tests that start servers and whatever) the test can’t run at the same time as another test on the same box. The test queue has to be sequential but the build queue could be processed by multiple threads (i.e. it would be nice to allow ‘x’ builds to run at once and be fed from a single queue, you can’t do that with the current CC.Net design). The 1 thread per project design of CC.Net bugs me, but right now we’re scaling OK, 227 threads on the XP build box and 484 on the Vista x64 box. The project trigger polling system is inefficient, but at least with my local project trigger changes it’s not a show stopper anymore.

My project dependency design is flawed in that there are only three projects that access the CVS repository and all others use file system source control providers to copy from the local CVS tree to their own build trees. I’m doing things this way because I use CVS modules a lot and I want to make sure that my projects build in an environment where only the code that I say they depend on is available to them. So, for example, I have lib A that depends on libs B and C and the CVS module for A checks out A and B and C which is fine but when you run an update with -d you get everything else that lives in the directory where A and B and C live (such as all of the other library projects in my world). This is BAD for me, so, instead, I have a CVS project which checks out all of the library projects into one place and each build project knows which dependencies to copy locally. It all works nicely but it means that when the CVS tree is updated ALL projects are scheduled for a rebuild and most then don’t rebuild (as they havent actually been changed) but it takes a while for the queue to clean out… This is made worse by my abundant use of the file system source code provider; every time a CVS update happens for any project all projects use the file system provider to compare the files they have with the bits of the CVS tree that they use; this touches a LOT of files and is pretty slow. I’d quite like to write a file system source code provider that uses a single manifest file to store the current state of the file system that it’s watching (file and directory details and an MD5 hash of contents) so that it only has to scan one half of the files that it currently scans (but that’s work for another day).

The program that I have that generates the config files for my projects works great. It was well worth writing. It’s still “a small step” away from being something that you can just run on a series of .sln files but it’s easy to add projects and configure dependencies, etc. It’s also very easy to tweak how the project configs are generated. The config file for the 5.2.1 branch of my server framework for VS2005 and VS2008 (x86 and x64 built separately) is here.

When you’re running with this many projects the ‘polling’ nature of the CC.Net task tray monitoring system becomes a performance issue and the fact that you can’t group projects into folders for easy display makes knowing what’s going on more complex that it needs to be.

I’m currently contemplating the next stage which is to build with different environment variables to allow me to compile against different versions of third party code (STLPort, OpenSSL and platform SDKs). This requires a new CC.Net task (one that sets up the environment for the tasks that it contains) but right now this is on hold as the increase in projects and disk space required don’t make it feasible.

Continuous integration has worked well to flush out some subtle bugs in some of the more complex unit tests and, in some cases, has flushed out bugs in the code too. Running your tests over and over again on various machines with the code built by various compilers and when the machines are running other code and so are ‘randomly busy’ is a great way to force race conditions and threading bugs out into the open. The fact that the failures then prevent further projects from building then gives an incentive to fix them. This means that the 5.2.1 release will contain slightly more bug fixes than originally planned… It’s also very useful to have all supported compilers building all changes straight away, it makes it easier to work around VC6 template issues if the latest changes that you just made in VS2008 are built NOW rather than in two months time when you do the next release…

All in all I’m pleased with the results, but I can’t help viewing it as a stepping stone to a system that actually works the way that I want it to…