Practical Testing: 28 - Finishing the timer wheel

Previously on “Practical Testing”… I’m writing a timer wheel which matches the interface used by my timer queue. This new implementation is designed for a particular usage scenario with the intention of trading space for speed and improving performance of some reliable UDP code.

Over the last four entries I’ve implemented various parts of the timer wheel and adjusted the test code so that I could reuse the tests that I already had for the other implementations with my timer wheel. The tests needed to be tweaked quite a bit to take into account the different behavioural characteristics of the wheel and the queues, this was accomplished using traits which determine the detail of how the class under test interacts with its service providers (mainly a tick count provider).

Today we finally get to the point where we have a working timer wheel that is compatible with the interface used by the two timer queues. We can then look at the results of the performance tests and work out where we need to go next.

The wheel that was presented in the previous entry is mostly complete. In fact only three functions are left to implement and these three functions implement a single feature; the handling of timeouts without needing to hold a lock on the wheel during timeout dispatch, this functionality makes it impossible to deadlock due to lock inversions which involve the timer wheel. I described the changes that were required when I added this functionality to the timer queue here, the changes required for the timer wheel are similar.

The main change is in how we store the timer data so that we can be dispatching an expired timer and setting, cancelling or destroying the same timer during the dispatch. We need to allow for this because the lock that should be held whilst you’re updating the timer wheel is not held during timer dispatch. So, rather than holding a single set of timer data within the timer we hold two sets, an active set and a timed out set. When the timeout handling begins the call to BeginTimeoutHandling(), which should be protected by a lock, updates each of the timers to prepare it for timeout dispatch. This means that it needs to walk the list of timers for this time and copy the active set of data to the timed out set and clear the active set. Now when the timer is processed by HandleTimeout() we’re working with the timed out set of data which allows the timer to be manipulated normally using the active data. Having to prepare the timers is a bit of a pain, it means that timer dispatch is O(n) where n is the number of timers to be dispatched, however this is the same as for the timer queue and we’re avoiding the O(log n) (n being the number of timers currently set in this case) of the balanced tree lookup required for the timer queues…

There’s still scope for some refactoring in the code and there’s a need for some more tests to make sure that we’re doing things sensibly when we fail to handle timeouts for a long period of time but this can be done later. I’ve added a few new tests to test the timer wheel with the CThreadedCallbackTimerQueue, I expect that could do with a name change now, but that can also wait.

Since we now have a fully functional timer wheel I can compare the performance results with those from the timer queue implementations. Note that these are the results that I get on my development box, the results that you get are likely to be different but the proportional differences should be similar… All test results are an average of 10 runs of the same test with 100,000 timers in use in each test.

  • Creating timers - Unsurprisingly, since very similar work is being done by both the queues and the wheel, CreateTimer() is roughly similar with the wheel actually taking fractionally longer at 55ms vs the queues which both take 50ms.

  • Setting timers - Again unsurprisingly, the timer wheel is much faster when setting timers at 8ms compared to 88ms for the queue. Note that the 8ms value is a best case scenario where we set and reset the same timer, with different timers the times rise to 15ms vs 130ms and with different timers being set for the same times we get 14ms vs 60ms. What doesn’t show here is that the wheel is also only C(n wheel users) compared to the queue’s C(n wheel users)+2C(n heap users) (see here for details of my ‘big C’ notation for talking about contention).

  • HandleTimeouts - When handling timeouts and holding the lock we get results of 54ms against 94ms for the queues. When not holding the locks the numbers are 57ms and 98ms. Again the wheel has lower contention.

All in all I’m pleased with the performance and contention improvements that have come from using a radically different design. The timer wheel isn’t as general purpose as the timer queues and it’s not going to be a good fit for all of the usage scenarios that I use the queue in but for those situations that it is appropriate for, the performance will be considerably better.

The code can be found here and the previous rules apply.