More performance testing

I’m continuing to help my performance obsessed client squeeze more out of his server. As always we’ve been using perfmon to view the server’s performance counters and since we were investigating CPU spikes we were interested in the thread activity counters that the server exposed.

The various perfmon enabled example servers that ship with The Server Framework have always exposed counters that show the number of active and processing threads from various thread pools. It’s always been useful to see these snapshots of thread activity and with many servers these counters are enough to tune the server to the hardware. This time around it became clear that a simple sampling of a ‘how many threads are processing right now’ counter wasn’t good enough. Our servers tend to be designed around work queues and thread pools that process them, the counter that we usually find quite useful is the one that is incremented when a thread from the pool starts working on an item and decremented when it’s done. The perfmon display then shows us how busy the thread pool is and, in the case of our expanding pools, how it’s growing. This kind of counter is fine when the work that is done by the thread is quite long in duration; stuff such as our business logic threads hitting a database, perhaps. With lots of short duration work items, however, it’s less useful as perfmon samples the counters every so often and, with short duration work items, the thread processing counters are being changed many times per second and so the sample is less accurate.

Adding a new counter which measures the number of items processed per second is much more useful. This lets us see how much work the threads are doing without relying on perfmon to sample at a specific rate. By using a counter of PERF_COUNTER_COUNTER or PERF_COUNTER_BULK_COUNT we simply increment the counter when we process a work item and the performance counter infrastructure deals with scaling it to be a value per second. Whilst we can get something similar by instrumenting the queue that feeds the thread pool this isn’t possible for the I/O threads as these are dealing with work items which are, generally, asynchronous I/O completions and as such we have no way to instrument when they are added to the IOCP that’s feeding our threads…

The new counters give graphs which are much more representative of the work being done by the threads and the old counters give a more general overview and are better for longer duration work items; due to the fact that the new counters are more resistant to sampling errors they provide a more reliable, if slightly harder to understand, view on the internals of the server.

Now that we can see more accurately we can tune things a little more…