<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <title>Rambling Comments</title>
    <link rel="alternate" type="text/html" href="http://www.lenholgate.com/blog/" />
    <link rel="self" type="application/atom+xml" href="http://www.lenholgate.com/blog/atom.xml" />
    <id>tag:www.lenholgate.com,2010-12-10:/blog//12</id>
    <updated>2011-06-23T11:04:32Z</updated>
    
    <generator uri="http://www.sixapart.com/movabletype/">Movable Type Pro 5.12</generator>

<entry>
    <title>Practical Testing: 31 - A bug in DestroyTimer.</title>
    <link rel="alternate" type="text/html" href="http://www.lenholgate.com/blog/2011/01/practical-testing-31---a-bug-in-destroytimer.html" />
    <id>tag:www.lenholgate.com,2011:/blog//12.1054</id>

    <published>2011-01-13T16:36:25Z</published>
    <updated>2011-06-23T11:04:32Z</updated>

    <summary>Back in 2004, I wrote a series of articles called &quot;Practical Testing&quot; where I took a piece of complicated multi-threaded code and wrote tests for it. I then rebuild the code from scratch in a test driven development style to...</summary>
    <author>
        <name>Len</name>
        
    </author>
    
        <category term="Source Code" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Testing" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en-us" xml:base="http://www.lenholgate.com/blog/">
        <![CDATA[<p>Back in 2004, I wrote a series of articles called <a href="http://www.lenholgate.com/blog/2004/05/practical-testing.html">"Practical Testing"</a> where I took a piece of complicated multi-threaded code and wrote tests for it. I then rebuild the code from scratch in a test driven development style to show how writing your tests before your code changes how you design your code. Since the original articles there have been several bug fixes and redesigns all of which have been supported by the original unit tests and many of which have led to the development of more tests.</p>

Whilst doing some development on a new server design I managed to expose a bug which has been present in the timer queue code for some time and which is also present in the timer wheel implementation. The bug allows memory within the timer queue to be deleted twice if you happen to call <code>DestroyTimer()</code> on a timer from within <code>OnTimer()</code> for that same timer. The problem occurs infrequently as it requires the memory for the timer data that has just been deleted to be set with a specific bit pattern that will then cause the timer queue to think that the data needs to be deleted again. It also requires that you're calling <code>HandleTimeouts()</code> to handle timeouts whilst allowing the timer queue to hold its internal lock; this is something most of my servers don't do. Anyway, circumstances conspired to make this bug visible and so here I am to fix it.]]>
        <![CDATA[<p>Of course, the first thing to do is write a test that exposes the bug. The test is fairly simple to build but it requires that we adjust the mock timer that we use so that it can be told to delete the timer during timeout handling. The new test looks something like this:</p>

<pre class="brush: cpp gutter: false">template &lt;class Q, class T, class P&gt;
void TCallbackTimerQueueTestBase&lt;Q, T, P&gt;::TestDestroyTimerDuringOnTimerInHandleTimeouts()
{
   JetByteTools::Win32::Mock::CMockTimerQueueMonitor monitor;

   P tickProvider;

   tickProvider.logTickCount = false;

   {
      Q timerQueue(monitor, tickProvider);

      CheckConstructionResults(tickProvider);

      CLoggingCallbackTimer timer;

      const Milliseconds timeout = 100;

      const IQueueTimers::UserData userData = 1;

      IQueueTimers::Handle handle = CreateAndSetTimer(
         tickProvider,
         timerQueue,
         timer,
         timeout,
         userData);

      timer.DestroyTimerInOnTimer(timerQueue, handle);

      const Milliseconds expectedTimeout = CalculateExpectedTimeout(timeout);

      THROW_ON_FAILURE_EX(expectedTimeout == timerQueue.GetNextTimeout());

      tickProvider.CheckResult(_T("|GetTickCount|"));

      tickProvider.SetTickCount(expectedTimeout);

      timerQueue.HandleTimeouts();

      tickProvider.CheckResult(_T("|GetTickCount|"));

      timer.CheckResult(_T("|OnTimer: 1|TimerDestroyed|"));

      tickProvider.CheckNoResults();

      THROW_ON_NO_EXCEPTION_EX_1(timerQueue.DestroyTimer, handle);
   }

   THROW_ON_FAILURE_EX(true == monitor.NoTimersAreActive());
}
</pre>
<p>And the change to the mock looks something like this:</p>
<pre class="brush: cpp gutter: false">void CLoggingCallbackTimer::OnTimer(
   UserData userData)
{
   if (logMessage)
   {
      if (logUserData)
      {
         LogMessage(_T("OnTimer: ") + ToString(userData));
      }
      else
      {
         LogMessage(_T("OnTimer"));
      }
   }

   if (m_pTimerQueue)
   {
      m_pTimerQueue-&gt;DestroyTimer(m_handle);

      LogMessage(_T("TimerDestroyed"));
   }

   ::InterlockedIncrement(&amp;m_numTimerEvents);

   m_timerEvent.Set();
}</pre>
<p>The result is that the timer is destroyed during timeout handling and the test demonstrates the failure in the code.</p>

<p>Unfortunately, with the latest build of the code the test does NOT demonstrate the problem. Unfortunately the PTMalloc implementation <a href="http://www.lenholgate.com/blog/2010/09/practical-testing-30---reducing-contention.html">that we're currently using</a> doesn't allow you to set it to fill deleted memory with an unlikely bit pattern. The default allocator with the Visual Studio C runtime does allow this in debug builds and this helps to force the bug into view. Adding the new test to the code that was presented in <a href="http://www.lenholgate.com/blog/2010/09/practical-testing-29---fixing-the-timer-wheel.html">part 29</a> causes the bug to manifest and for Visual Studio to pop up a debug assertion message when the memory is double deleted.</p>

<p>The potential for this problem to occur when using the <code>BeginTimeoutHandling()</code>, <code>EndTimeoutHandling()</code> style of "lock free" timeout handling was already identified and fixed back when I first added the "lock free" timeout handling in <a href="http://www.lenholgate.com/blog/2008/08/practical-testing-18---removing-the-potential-to-deadlock.html">part 18</a>. The fix is pretty similar. There's already a flag in the internal timer data structure that's used to delay destruction until after the timer has finished being processed but it doesn't get set when using <code>HandleTimeouts()</code>. The fixes are fairly simple for both the timer wheel and the timer queue.</p>
<pre class="brush: cpp gutter: false">void CCallbackTimerQueueBase::TimerData::OnTimer()
{
   m_processingTimeout = true;

   OnTimer(m_active);

   m_processingTimeout = false;
}

CCallbackTimerWheel::TimerData *CCallbackTimerWheel::TimerData::OnTimer()
{
   m_ppPrevious = 0;

   m_processingTimeout = true;

   OnTimer(m_active);

   m_processingTimeout = false;

   return m_pNext;
}
</pre>

<p>The code from part 29 with these fixes applied can be found <a href="http://www.lenholgate.com/zips/PracticalTesting-31a.zip" onclick="_gaq.push(['_trackEvent', 'Downloads', 'PracticalTesting-31a.zip']);">here</a>.</p>

<p>The code that uses PTMalloc from part 30 with these fixes applied can be found <a href="http://www.lenholgate.com/zips/PracticalTesting-31b.zip" onclick="_gaq.push(['_trackEvent', 'Downloads', 'PracticalTesting-31b.zip']);">here</a>.</p>

<p>Please note that the <a href="http://www.lenholgate.com/archives/000906.html">previous rules</a> apply.</p>]]>
    </content>
</entry>

<entry>
    <title>Practical Testing: 30 - Reducing contention</title>
    <link rel="alternate" type="text/html" href="http://www.lenholgate.com/blog/2010/09/practical-testing-30---reducing-contention.html" />
    <id>tag:www.socketframework.com,2010:/blog//12.980</id>

    <published>2010-09-23T14:19:51Z</published>
    <updated>2011-01-02T15:28:45Z</updated>

    <summary>Previously on &quot;Practical Testing&quot;... I&apos;ve been looking at the performance of the timer system that I developed and have built a more specialised and higher performance timer system which is more suitable for some high performance reliable UDP work that...</summary>
    <author>
        <name>Len</name>
        
    </author>
    
        <category term="Source Code" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Testing" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en-us" xml:base="http://www.lenholgate.com/blog/">
        <![CDATA[<p>Previously on <a href="http://www.lenholgate.com/archives/000306.html">"Practical Testing"</a>... I've been looking at the performance of the timer system that I developed and have built a <a href="http://www.lenholgate.com/archives/000909.html">more specialised</a> and higher performance timer system which is more suitable for some high performance reliable UDP work that I'm doing. Whilst developing the new timer wheel I began to consider the thread contention issues that the timer system faced and came up with a notation for talking about contention (see <a href="http://www.lenholgate.com/archives/000908.html">here</a>). Both the general purpose timer queue and the new timer wheel suffered from more potential thread contention that they needed to because of the way the STL containers that I am using a) require memory allocations on insertion and removal and b) use the program heap for those memory allocations. This converted the contention for the timer system from contention between the number of threads accessing the timer system to contention between the number of threads accessing the program heap...</p>

I mentioned a while back that a custom STL allocator would be one way to reduce the thread contention; the allocator could use a private heap that only the timer system used and so the potential contention during memory allocation and release would be reduced to the potential contention for the timer object itself. Today I'll present the results of switching to a private heap using a custom STL allocator for the STL collections that I use.]]>
        <![CDATA[<p>I <a href="http://www.lenholgate.com/archives/000919.html">went looking</a> for information about writing my own STL allocator and ended up with some useful code from <a href="http://www.tantalon.com/pete.htm">Pete Isensee</a>. I then boiled this down to something that fitted with my development style and that supported allocations using <code>HeapAlloc()</code>. Actually using the allocator was trivial, if slightly messy.</p>
<pre class="brush: cpp gutter: false">typedef std::deque&lt;TimerData *&gt; Timers;
 
typedef std::pair&lt;size_t, Timers&gt; TimersAtThisTime;
 
typedef std::map&lt;ULONGLONG, TimersAtThisTime *&gt; TimerQueue;
 
typedef std::pair&lt;TimerQueue::iterator, size_t&gt; TimerLocation;
 
typedef std::map&lt;TimerData *, TimerLocation&gt; HandleMap;
</pre>
<p>became</p>
<pre class="brush: cpp gutter: false">typedef std::deque&lt;TimerData *, CAlloc&lt;TimerData *&gt; &gt; Timers;
 
typedef std::pair&lt;size_t, Timers&gt; TimersAtThisTime;
 
typedef std::map&lt;ULONGLONG, TimersAtThisTime *,
   std::less&lt;ULONGLONG&gt;,
   CAlloc&lt;std::pair&lt;ULONGLONG, TimersAtThisTime *&gt; &gt; &gt; TimerQueue;
 
typedef std::pair&lt;TimerQueue::iterator, size_t&gt; TimerLocation;
 
typedef std::map&lt;TimerData *, TimerLocation,
   std::less&lt;TimerData *&gt;,
   CAlloc&lt;std::pair&lt;TimerData *, TimerLocation&gt; &gt; &gt; HandleMap;
</pre>
<p>And I had to add some allocators and a private heap to the class.</p>
<pre class="brush: cpp gutter: false">CSmartHeapHandle m_heap;
 
CAlloc&lt;TimerData *&gt; m_timersAllocator;
 
CAlloc&lt;std::pair&lt;ULONGLONG, TimersAtThisTime *&gt; &gt; m_timerQueueAllocator;
 
CAlloc&lt;std::pair&lt;TimerData *, TimerLocation&gt; &gt; m_handleMapAllocator;</pre><br />
And then adjust the constructors to make use of all of this:<br />
<pre class="brush: cpp gutter: false">CCallbackTimerQueueBase::CCallbackTimerQueueBase()
   :  m_heap(::HeapCreate(HEAP_NO_SERIALIZE, 0,0)),
      m_timersAllocator(m_heap),
      m_timerQueueAllocator(m_heap),
      m_handleMapAllocator(m_heap),
      m_queue(std::less&lt;ULONGLONG&gt;(), m_timerQueueAllocator),
      m_handleMap(std::less&lt;TimerData *&gt;(), m_handleMapAllocator),
      m_monitor(s_monitor),
      m_maxTimeout(s_timeoutMax),
      m_handlingTimeouts(InvalidTimeoutHandleValue)
{
   if (!m_heap.IsValid())
   {
      throw CException(_T("CCallbackTimerQueueBase::CCallbackTimerQueueBase()"), _T("Failed to create private heap"));
   }
}
</pre>
<p>Note that since we are taking responsibility for locking around access to the heap we can tell <code>HeapCreate()</code> not to bother locking internally with the <code>HEAP_NO_SERIALIZE</code> flag.</p>

<p>Unfortunately this makes performance worse, though arguably it has reduced contention. The problem is that <code>HeapAlloc()</code> isn't as efficient as the standard implementation of <code>new</code> and so whilst we've reduced contention we've also reduced overall performance. Not good.</p>

<p>I did some research on high performance memory allocators and decided that <a href="http://www.malloc.de/en/">PTMalloc</a> was a good fit for what I needed. PTMalloc supports separate heaps by what it terms "malloc spaces" or <code>mspace</code> and it supports multi-threaded use where you're responsible for locking. I wrapped the code in one of my library projects and created some helper code so that it integrated more easily with the rest of my code.</p>

<p>A new STL allocator implementation can then allocate and deallocate from a PTMalloc <code>mspace</code> rather than from a heap created with <code>HeapAlloc()</code>. The results were good, faster than the original <code>new</code> implementation and, due to the private heap, the contention of the timer queue was reduced to <b>C(n threads using the queue)</b>.</p>

<p>The STL allocator isn't the only place that dynamic memory is being allocated though, we're also allocating a timer handle when we create a timer and in the timer queue we need to allocate the various structures that help us build and manage our queue. If these continue to use the standard program heap then our worst possible contention is still <b>C(n threads using the program heap)</b> rather than <b>C(n threads using the queue)</b>.</p>

<p>Providing custom allocation and deallocation code, that uses the PTMalloc private heap, for the other memory allocations deals with both the contention and boosts performance.</p>

<p>In the case of the <b>TimerData</b> object we can add a placement new implementation for it that uses our private heap. For the simpler memory objects we allocate and construct them manually using the private heap's allocator.</p>

<p>On my system the performance tests show some pretty nice improvements for the timer queues. Every operation is faster. Timer creation is down from 60ms per 100,000 to 40ms. Setting timers down from 130ms to 100ms (again per 100,000) and timer handling down a little.</p>

<p>Of course the allocation and STL allocator changes can also be applied, albeit with lesser results as the only STL collection used is for timer handle validation and the only dynamically allocated data is the timer handle.</p>

<p>The code for the STL allocator using <code>HeapAlloc()</code> can be found <a href="http://www.lenholgate.com/zips/PracticalTesting-30a.zip" onclick="_gaq.push(['_trackEvent', 'Downloads', 'PracticalTesting-30a.zip']);">here</a>.</p>

<p>The code for the STL allocator using PTMalloc can be found <a href="http://www.lenholgate.com/zips/PracticalTesting-30b.zip" onclick="_gaq.push(['_trackEvent', 'Downloads', 'PracticalTesting-30b.zip']);">here</a>.</p>

<p>And the code for all allocations using PTMalloc can be found <a href="http://www.lenholgate.com/zips/PracticalTesting-30c.zip" onclick="_gaq.push(['_trackEvent', 'Downloads', 'PracticalTesting-30c.zip']);">here</a>.</p>

<p>Please note that the <a href="http://www.lenholgate.com/archives/000906.html">previous rules</a> apply.</p>

There's still scope for improvement. As I've mentioned before (<a href="http://www.lenholgate.com/archives/000907.html">here</a> and <a href="http://www.lenholgate.com/archives/000920.html">here</a>), the STL containers are not intrusive and so memory must be allocated for each item placed in them. Old school intrusive containers wouldn't require memory allocation and release at all and so should improve performance somewhat. What's more, a custom designed intrusive multi-map could allow for the "remove all entries that match this key" operation which I'm currently fudging using more dynamically allocated structures...]]>
    </content>
</entry>

<entry>
    <title>Practical Testing: 29 - Fixing the timer wheel</title>
    <link rel="alternate" type="text/html" href="http://www.lenholgate.com/blog/2010/09/practical-testing-29---fixing-the-timer-wheel.html" />
    <id>tag:www.socketframework.com,2010:/blog//12.978</id>

    <published>2010-09-09T07:41:33Z</published>
    <updated>2011-01-02T15:24:27Z</updated>

    <summary>Previously on &quot;Practical Testing&quot;... I&apos;m writing a timer wheel which matches the interface used by my timer queue. This new implementation is designed for a particular usage scenario with the intention of trading space for speed and improving performance of...</summary>
    <author>
        <name>Len</name>
        
    </author>
    
        <category term="Source Code" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Testing" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en-us" xml:base="http://www.lenholgate.com/blog/">
        <![CDATA[Previously on <a href="http://www.lenholgate.com/archives/000306.html">"Practical Testing"</a>...  I'm writing a <a href="http://www.lenholgate.com/archives/000909.html">timer wheel</a> which matches the interface used by my timer queue. This  new implementation is designed for a particular usage scenario with the intention of trading space for speed and improving performance of some reliable UDP code. The last entry completed the development of the timer wheel. This time we fix a couple of the bugs that I've discovered since I started to integrate the code with the system that it was developed for.]]>
        <![CDATA[<p>I try to be pragmatic with my testing. I know that I could write tests for the rest of my life and never prove that the code under test was 100% correct and so I try and write the smallest number of tests that give the largest amount of validation and confidence in the code. Because of this I quite expect to discover bugs in my code and find that I need to add a new test that reproduces the bug before I then fix the bug and the test passes.</p>

<p>That said, I still should have included a test for these particular bugs in my original test suite for the timer wheel, it's a fairly obvious hole in the testing. Both bugs are fairly serious, and fairly simple. Luckily once I integrated the code into the real application the problems showed up quickly and regularly; this meant that it was easy to track them down and once I had an idea of what the likely problems were it was easy to put together some tests that showed the problems.</p>

<p>The first bug is in how we handle calculating the next timeout when that timer has wrapped and is before our current position in the timer wheel rather than after it. None of the existing tests put the code in this position, but real usage caused it almost immediately. </p>

<p><img alt="TimerWheel-4.png" src="http://www.lenholgate.com/blog/images/TimerWheel-4.png" width="434" height="301" border="0" /></p>

<p>The calculation for this situation was wrong, it was this:</p>
<pre class="brush: cpp gutter: false">nextTimeout = static_cast&lt;Milliseconds&gt;((
   (m_pFirstTimerSetHint &gt; m_pNow ?
      (m_pFirstTimerSetHint - m_pNow) :
      (m_pNow - m_pFirstTimerSetHint)) + 1) * m_timerGranularity);
</pre>
<p>and it should be this:</p>
<pre class="brush: cpp gutter: false">nextTimeout = static_cast&lt;Milliseconds&gt;((
   (m_pFirstTimerSetHint &gt;= m_pNow ?
      (m_pFirstTimerSetHint - m_pNow) :
      (m_pTimersEnd - m_pNow + m_pFirstTimerSetHint - m_pTimersStart)) + 1) * m_timerGranularity);
</pre>

<p>The original calculation caused the timer wheel to report an incorrect next timeout which caused timeout handling to stall until our first timer was larger than "now" again....</p>

<p>The second bug is slightly less serious but occurs if there are no timers set and we're using the <code>HandleTimeouts()</code> call for timeout processing. The timer wheel's view of the current time is updated during timer processing. If this there are no timers set then the timer processing loop is skipped inside of <code>HandleTimeouts()</code> and the wheel's view of the current time begins to lag. This progressively reduces the value of the timeout that you can set with <code>SetTimer()</code>. The fix is to have <code>SetTimer()</code> reset the wheel if no timers are currently set. In this situation it's safe to set the wheel to its initial state before setting a new timer. The fix is pretty simple, we just add this:</p>
<pre class="brush: cpp gutter: false">Milliseconds CCallbackTimerWheel::CalculateTimeout(
   const Milliseconds timeout)
{
   const Milliseconds now = m_tickCountProvider.GetTickCount();
  
   if (m_numTimersSet == 0)
   {
      m_currentTime = now;
  
      m_pNow = m_pTimersStart;
   }
  
   const Milliseconds actualTimeout = timeout + (now - m_currentTime);
  
   if (actualTimeout &gt; m_maximumTimeout)
   {
      throw CException(
         _T("CCallbackTimerWheel::CalculateTimeout()"),
         _T("Timeout is too long. Max is: ") +
         ToString(m_maximumTimeout) +
         _T(" tried to set: ") + ToString(actualTimeout) +
         _T(" (") + ToString(timeout) + _T(")"));
   }
  
   return actualTimeout;
}
</pre>

<p>Where the fix is the code inside of the <code>if (m_numTimersSet == 0)</code> block.</p>

<p>I've also renamed the <code>#define</code> that's used to enable monitoring; the previous name seemed a little back to front...</p>

The code can be found <a href="http://www.lenholgate.com/zips/PracticalTesting-29.zip" onclick="_gaq.push(['_trackEvent', 'Downloads', 'PracticalTesting-29.zip']);">here</a> and the <a href="http://www.lenholgate.com/archives/000906.html">previous rules</a> apply.]]>
    </content>
</entry>

<entry>
    <title>Practical Testing: 28 - Finishing the timer wheel</title>
    <link rel="alternate" type="text/html" href="http://www.lenholgate.com/blog/2010/08/practical-testing-28---finishing-the-timer-wheel.html" />
    <id>tag:www.socketframework.com,2010:/blog//12.973</id>

    <published>2010-08-04T11:54:16Z</published>
    <updated>2011-01-02T15:19:53Z</updated>

    <summary>Previously on &quot;Practical Testing&quot;... I&apos;m writing a timer wheel which matches the interface used by my timer queue. This new implementation is designed for a particular usage scenario with the intention of trading space for speed and improving performance of...</summary>
    <author>
        <name>Len</name>
        
    </author>
    
        <category term="Source Code" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Testing" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en-us" xml:base="http://www.lenholgate.com/blog/">
        <![CDATA[<p>Previously on <a href="http://www.lenholgate.com/archives/000306.html">"Practical Testing"</a>...  I'm writing a <a href="http://www.lenholgate.com/archives/000909.html">timer wheel</a> which matches the interface used by my timer queue. This  new implementation is designed for a particular usage scenario with the intention of trading space for speed and improving performance of some reliable UDP code.</p>

<p>Over the last four entries I've implemented various parts of the timer wheel and adjusted the test code so that I could reuse the tests that I already had for the other implementations with my timer wheel. The tests needed to be tweaked quite a bit to take into account the different behavioural characteristics of the wheel and the queues, this was accomplished using traits which determine the detail of how the class under test interacts with its service providers (mainly a tick count provider).</p>

Today we finally get to the point where we have a working timer wheel that is compatible with the interface used by the two timer queues. We can then look at the results of the performance tests and work out where we need to go next.]]>
        <![CDATA[<p>The wheel that was presented in the <a href="http://www.lenholgate.com/archives/000918.html">previous entry</a> is mostly complete. In fact only three functions are left to implement and these three functions implement a single feature; the handling of timeouts without needing to hold a lock on the wheel during timeout dispatch, this functionality makes it impossible to deadlock due to lock inversions which involve the timer wheel. I described the changes that were required when I added this functionality to the timer queue <a href="http://www.lenholgate.com/archives/000795.html">here</a>, the changes required for the timer wheel are similar.</p>

<p>The main change is in how we store the timer data so that we can be dispatching an expired timer and setting, cancelling or destroying the same timer during the dispatch. We need to allow for this because the lock that should be held whilst you're updating the timer wheel is not held during timer dispatch. So, rather than holding a single set of timer data within the timer we hold two sets, an active set and a timed out set. When the timeout handling begins the call to <code>BeginTimeoutHandling()</code>, which should be protected by a lock, updates each of the timers to prepare it for timeout dispatch. This means that it needs to walk the list of timers for this time and copy the active set of data to the timed out set and clear the active set. Now when the timer is processed by <code>HandleTimeout()</code> we're working with the timed out set of data which allows the timer to be manipulated normally using the active data. Having to prepare the timers is a bit of a pain, it means that timer dispatch is <b>O(n)</b> where <b>n</b> is the number of timers to be dispatched, however this is the same as for the timer queue and we're avoiding the <b>O(log n)</b> (n being the number of timers currently set in this case) of the balanced tree lookup required for the timer queues...</p>

<p>There's still scope for some refactoring in the code and there's a need for some more tests to make sure that we're doing things sensibly when we fail to handle timeouts for a long period of time but this can be done later. I've added a few new tests to test the timer wheel with the <code>CThreadedCallbackTimerQueue</code>, I expect that could do with a name change now, but that can also wait.</p>

<p>Since we now have a fully functional timer wheel I can compare the performance results with those from the timer queue implementations. Note that these are the results that I get on my development box, the results that you get are likely to be different but the proportional differences should be similar... All test results are an average of 10 runs of the same test with 100,000 timers in use in each test.</p>

<ul>
<li><b>Creating timers</b> - Unsurprisingly, since very similar work is being done by both the queues and the wheel, <code>CreateTimer()</code> is roughly similar with the wheel actually taking fractionally longer at 55ms vs the queues which both take 50ms.</li>
<li><b>Setting timers</b> - Again unsurprisingly, the timer wheel is much faster when setting timers at 8ms compared to 88ms for the queue.  Note that the 8ms value is a best case scenario where we set and reset the same timer, with different timers the times rise to 15ms vs 130ms and with different timers being set for the same times we get 14ms vs 60ms. What doesn't show here is that the wheel is also only <b>C(n wheel users)</b> compared to the queue's <b>C(n wheel users)+2C(n heap users)</b> (see <a href="http://www.lenholgate.com/archives/000908.html">here</a> for details of my 'big C' notation for talking about contention).</li>
<li><b>HandleTimeouts</b> - When handling timeouts and holding the lock we get results of 54ms against 94ms for the queues. When not holding the locks the numbers are 57ms and 98ms. Again the wheel has lower contention.</li>
</ul>

<p>All in all I'm pleased with the performance and contention improvements that have come from using a radically different design. The timer wheel isn't as general purpose as the timer queues and it's not going to be a good fit for all of the usage scenarios that I use the queue in but for those situations that it <i>is</i> appropriate for, the performance will be considerably better.</p>

The code can be found <a href="http://www.lenholgate.com/zips/PracticalTesting-28.zip"  onclick="_gaq.push(['_trackEvent', 'Downloads', 'PracticalTesting-28.zip']);">here</a> and the <a href="http://www.lenholgate.com/archives/000906.html">previous rules</a> apply.]]>
    </content>
</entry>

<entry>
    <title>Practical Testing: 27 - Fixing things...</title>
    <link rel="alternate" type="text/html" href="http://www.lenholgate.com/blog/2010/08/practical-testing-27---fixing-things.html" />
    <id>tag:www.socketframework.com,2010:/blog//12.970</id>

    <published>2010-08-03T16:53:50Z</published>
    <updated>2011-01-02T15:16:03Z</updated>

    <summary>Previously on &quot;Practical Testing&quot;... To deal with some specific usage scenarios of a piece of general purpose code I&apos;m in the process of implementing a timer wheel that matches the interface to the timer queue that I previously developed in...</summary>
    <author>
        <name>Len</name>
        
    </author>
    
        <category term="Source Code" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Testing" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en-us" xml:base="http://www.lenholgate.com/blog/">
        <![CDATA[Previously on <a href="http://www.lenholgate.com/archives/000306.html">"Practical Testing"</a>... To deal with some specific usage scenarios of a piece of general purpose code I'm in the process of implementing a <a href="http://www.lenholgate.com/archives/000909.html">timer wheel</a> that matches the interface to the timer queue that I previously developed in this <a href="http://www.lenholgate.com/archives/000306.html">series of articles</a>. Last time I left myself with a failing test. The problem is that setting a new timer on the timer wheel sets a timer that's relative to the time that timer wheel thinks is 'now' and the timer wheel's view of the current time could be slightly behind reality; see the <a href="http://www.lenholgate.com/archives/000913.html">previous entry</a> for a diagram that explains the problem.]]>
        <![CDATA[<p>This kind of problem wouldn't exist if the timer wheel was operating on a hard real time system where each tick of the hardware clock caused the timer wheel to 'rotate' and process timers that have expired. Unfortunately since we just have a normal thread to process timers the wheel can get slightly behind reality. There are two ways to solve this problem and both have drawbacks. The first is to cause the wheel to be processed before any new timer is set, this would mean that the wheel is always up to date and therefore the timer insertion would be correct. Unfortunately this leads to timers being handled on any thread that calls <code>SetTimer()</code> which may not be ideal for users of the wheel, it also means that setting a timer is no longer O(1)... The second approach is to simply set the timer based on the current time and allow for the difference between the current time and the timer wheel's view of the current time when the timer is inserted into the wheel. The disadvantage with this approach is that the maximum timeout that can be set will fluctuate around the lag between 'now' and the timer wheel's view of 'now'. You can work around this fluctuation by making the wheel have a maximum timeout that is larger than the actual maximum timeout that you wish to set and the expected lag...</p>

<p>I've taken the second approach as non O(1) timer setting and 'random thread timer dispatch' are not desirable qualities for the usage scenarios that I'm currently targeting. This means that the timer wheel now queries the tick count provider when you call <code>SetTimer()</code> but it's easy to adjust the tests for this due to the test traits that I introduced a while back.</p>

<p>Now that <code>SetTimer()</code> works correctly we can move on to implementing the remaining functions. Unfortunately though, before we do that we need to deal with a memory leak bug in the <code>CCallbackTimerQueueBase</code> class which the tests can't detect and which I missed due to <a href="http://www.lenholgate.com/archives/000914.html">not running BoundsChecker</a> after each set of changes... The leak was introduced when I switched from using the <code>std::multimap</code> to using a <code>std::deque</code> back in <a href="http://www.lenholgate.com/archives/000907.html">part 21</a>. Unfortunately I missed out a couple of <code>delete</code> statements to clear up the new structure that we allocate to store in the <code>std::deque</code>  when we set timers. This just goes to show that it doesn't matter how many tests you have and how good your coverage is, it's never enough to prove that the code is without bugs. Running BoundsChecker showed the bug quite clearly but it's a pity that it needs to be a separate stage of testing. Instrumenting memory allocation within the test would help and is something that I might look into... Anyway, the fixes are to add a loop which cleans up the allocated memory in the queue's destructor and to clean up blocks of timers as they expire in <code>HandleTimeouts()</code>.</p>

The code can be found <a href="http://www.lenholgate.com/zips/PracticalTesting-27.zip" onclick="_gaq.push(['_trackEvent', 'Downloads', 'PracticalTesting-27.zip']);">here</a> and the <a href="http://www.lenholgate.com/archives/000906.html">previous rules</a> apply.]]>
    </content>
</entry>

<entry>
    <title>Practical Testing: 26 - More functionality, more refactoring and a new bug</title>
    <link rel="alternate" type="text/html" href="http://www.lenholgate.com/blog/2010/07/practical-testing-26---more-functionality-more-refactoring-and-a-new-bug.html" />
    <id>tag:www.socketframework.com,2010:/blog//12.965</id>

    <published>2010-07-23T08:56:01Z</published>
    <updated>2011-01-02T13:45:12Z</updated>

    <summary>Previously on &quot;Practical Testing&quot;... To deal with some specific usage scenarios of a piece of general purpose code I&apos;m in the process of implementing a timer wheel that matches the interface to the timer queue that I previously developed in...</summary>
    <author>
        <name>Len</name>
        
    </author>
    
        <category term="Source Code" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Testing" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en-us" xml:base="http://www.lenholgate.com/blog/">
        <![CDATA[<p>Previously on <a href="http://www.lenholgate.com/archives/000306.html">"Practical Testing"</a>... To deal with some specific usage scenarios of a piece of general purpose code I'm in the process of implementing a <a href="http://www.lenholgate.com/archives/000909.html">timer wheel</a> that matches the interface to the timer queue that I previously developed in this <a href="http://www.lenholgate.com/archives/000306.html">series of articles</a>. The timer wheel trades space for speed and so far the development has gone well as I've been able to use the tests that I had already developed for the previous implementations to guide the new development.</p>

By the end of <a href="http://www.lenholgate.com/archives/000911.html">last time</a> we'd got to the point where we had four functions left to implement...]]>
        <![CDATA[<p>Today we'll deal with the second style of <code>SetTimer()</code> call. The timers can be set in two ways, the first is useful if you need to repeatedly set and reset a timer, you create a timer handle by calling <code>CreateTimer()</code> and you can then call <code>SetTimer()</code> and <code>CancelTimer()</code> as often as you like. Finally you call <code>DestroyTimer()</code> when you're done with your timer. The second style is for when you simply want to 'fire and forget' a timer. You simply call the second variation of <code>SetTimer()</code> and this creates a timer for you, sets it and destroys it once the timer has timed out or when the timer system is destroyed. This second style of timer makes it easy for the caller and slightly more complex for the timer system since there are now timers that need to be cleaned up once they expire. However, we need to deal with this kind of thing anyway as a timer could be destroyed during timeout processing, or in the gap between a begin/handle/end sequence of timer handling.</p>

<p>Whilst adjusting the tests to make sure they took into account the timer wheel's traits and generally worked correctly with the new implementation I decided that although I like my tests rigid (some would say <a href="http://www.lenholgate.com/archives/000910.html">brittle</a>, or fragile), I'd gone slightly too far with the logging coming out of the tick count providers. These were reporting the value of the tick count that was being provided as well as the fact that the call was made. Now in some of my tests this is useful but here the test was clearly setting the value so logging it was of no use and made the tests more complex in the presence of variable timer granularities. Adjusting the mocks and the tests makes them a bit cleaner.</p>

<p>The tests also needed adjusting now that all of the code can be built with monitoring enabled. The <a href="http://www.lenholgate.com/archives/000910.html">traits</a> work pretty well for this and I'm happy with the results.</p>

<p>There's still quite a bit of duplicate code in the tests and the code to create and set a timer is one piece that's easy to slim down by using the helper function that is used by some but not all of the tests. Only the tests that are actually for <code>SetTimer()</code> need to do it long hand to make it clear what we're actually testing.</p>

<p>Since we now have timers that can delete themselves and since I <a href="http://www.lenholgate.com/archives/000906.html">recently added to the timer monitoring interface</a> to allow us to ensure that all timers were always cleaned up it seems about the right time to add monitoring interface support to the timer wheel.</p>

<p>The resulting changes to implement the second <code>SetTimer()</code> overload are as follows, note that I've adjusted <code>CreateTimer()</code> so that the common code that I need to call when I create a timer in <code>SetTimer()</code> isn't duplicated. The timer data constructor that we're using there sets the timer up appropriately for single use mode.</p>
<pre class="brush: cpp gutter: false">CCallbackTimerWheel::Handle CCallbackTimerWheel::CreateTimer()
{
   TimerData *pData = new TimerData();
  
   return OnTimerCreated(pData);
}

CCallbackTimerWheel::Handle CCallbackTimerWheel::OnTimerCreated(
   TimerData *pData)
{
   m_handles.insert(pData);
  
#if (JETBYTE_PERF_TIMER_WHEEL_MONITORING_DISABLED == 0)
  
   m_monitor.OnTimerCreated();
  
#endif
  
   return reinterpret_cast&lt;Handle&gt;(pData);
}

bool CCallbackTimerWheel::SetTimer(
   const Handle &amp;handle,
   Timer &amp;timer,
   const Milliseconds timeout,
   const UserData userData)
{
   if (timeout &gt; m_maximumTimeout)
   {
      throw CException(
         _T("CCallbackTimerWheel::SetTimer()"), 
         _T("Timeout is too long. Max is: ") + ToString(m_maximumTimeout) + _T(" tried to set: ") + ToString(timeout));
   }
  
   TimerData &amp;data = ValidateHandle(handle);
  
   const bool wasPending = data.CancelTimer();
  
   data.UpdateData(timer, userData);
  
   InsertTimer(timeout, data, wasPending);
  
#if (JETBYTE_PERF_TIMER_WHEEL_MONITORING_DISABLED == 0)
  
   m_monitor.OnTimerSet(wasPending);
  
#endif
  
   return wasPending;
}
  
void CCallbackTimerWheel::SetTimer(
   IQueueTimers::Timer &amp;timer,
   const Milliseconds timeout,
   const IQueueTimers::UserData userData)
{
   if (timeout &gt; m_maximumTimeout)
   {
      throw CException(
         _T("CCallbackTimerWheel::SetTimer()"), 
         _T("Timeout is too long. Max is: ") + ToString(m_maximumTimeout) + _T(" tried to set: ") + ToString(timeout));
   }
  
   TimerData *pData = new TimerData(timer, userData);
  
   OnTimerCreated(pData);
  
   InsertTimer(timeout, *pData);
  
#if (JETBYTE_PERF_TIMER_WHEEL_MONITORING_DISABLED == 0)
  
   m_monitor.OnOneOffTimerSet();
  
#endif
}
</pre>
<p>So that we can delete these "one shot" timers when the timer wheel is destroyed we need to keep the handle in the handle map, this does, however, open a hole in our handle validation code as someone could pass in a random invalid handle value that matches one of the "one shot" timers and convince the timer wheel that the handle is valid. To prevent this we now also check that the handle isn't scheduled to be deleted after the timer expires, the only handles in the handle map that will be set like this are the "one shot" timers and these, by definition, don't have a valid handle that you can manipulate.</p>
<pre class="brush: cpp gutter: false">CCallbackTimerWheel::TimerData &amp;CCallbackTimerWheel::ValidateHandle(
   const Handle &amp;handle) const
{
   TimerData *pData = reinterpret_cast&lt;TimerData *&gt;(handle);
  
   Handles::const_iterator it = m_handles.find(pData);
  
   if (it == m_handles.end())
   {
      throw CException(
         _T("CCallbackTimerWheel::ValidateHandle()"), 
         _T("Invalid timer handle: ") + ToString(handle));
   }
  
   if (pData-&gt;DeleteAfterTimeout())
   {
      throw CException(
         _T("CCallbackTimerWheel::ValidateHandle()"), 
         _T("Invalid timer handle: ") + ToString(handle));
   }
  
   return *pData;
}
</pre>
<p>With all of that done and with some more tests passing I'm left with just the <code>BeginTimeoutHandling()</code>, <code>HandleTimeout()</code>,  <code>EndTimeoutHandling()</code> code to implement. Unfortunately there's a bug in our timer setting code for the timer wheel and the existing tests don't catch it.</p>

<p><img alt="TimerWheel-3.png" src="http://www.lenholgate.com/blog/images/TimerWheel-3.png" width="434" height="301" border="0" /></p>

<p>Let's assume that we're in the situation shown above and we set a timer. The timer wheel has its current time set to 35ms before the actual time because timeouts haven't been handled yet. At present if we set a timer for 10ms the timer will be set at the point marked as 30 on the diagram above rather than at the point marked 65. The existing tests don't show this problem as they all set timers for a 'now' that is the same as the time the wheel was created or just after timeouts have been handled; which means that current always equals now in most of the tests that set timers. A new test that sets a timer with now != current clearly shows the problem. I'll leave this broken test to show me the way for next time.</p>

The code can be found <a href="http://www.lenholgate.com/zips/PracticalTesting-26.zip" onclick="_gaq.push(['_trackEvent', 'Downloads', 'PracticalTesting-26.zip']);">here</a> and the <a href="http://www.lenholgate.com/archives/000906.html">previous rules</a> apply.]]>
    </content>
</entry>

<entry>
    <title>Practical Testing: 25 - Nothing is free</title>
    <link rel="alternate" type="text/html" href="http://www.lenholgate.com/blog/2010/07/practical-testing-25---nothing-is-free.html" />
    <id>tag:www.socketframework.com,2010:/blog//12.963</id>

    <published>2010-07-21T13:13:57Z</published>
    <updated>2011-01-02T13:41:50Z</updated>

    <summary>I&apos;m in the process of implementing a timer wheel that matches the interface to the timer queue that I previously developed in this series of articles. The idea being that for certain specific usage scenarios the timer wheel will perform...</summary>
    <author>
        <name>Len</name>
        
    </author>
    
        <category term="Source Code" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Testing" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en-us" xml:base="http://www.lenholgate.com/blog/">
        <![CDATA[<p>I'm in the process of implementing a <a href="http://www.lenholgate.com/archives/000909.html">timer wheel</a> that matches the interface to the timer queue that I previously developed in this <a href="http://www.lenholgate.com/archives/000306.html">series of articles</a>. The idea being that for certain specific usage scenarios the timer wheel will perform better than the timer queues. Last time I refactored the tests that I was using for the timer queues to remove duplication and I now have a set of failing tests for the new timer wheel. </p>

As soon as I started to look at making some of the failing tests pass I realised that having a heap of failing tests wasn't such a good idea, at least with my home brew test framework. I had stubbed out the timer wheel's interface and had decided to throw exceptions from the functions that weren't implemented yet. Those exceptions caused the tests that used that functionality to fail, so far so good. Unfortunately there was no differentiation between tests that I knew would fail and tests that just happened to be failing; I discovered this when I realised that some of the failing tests were for the timer queues and not the wheel... Switching the exception thrown to one of my testing exceptions, a "test skipped" exception means that I now have a load of timer wheel tests that are skipped due to lack of implementation code and the test failures are clearly failures. Once the real failures were fixed I could move on with the new code.]]>
        <![CDATA[<p>The great thing about a timer wheel is that it has O(1) performance for setting a timer, simply index into the array and push the new timer onto the head of the list of other timers at this time. It also has O(1) timer cancellation; each timer knows where it is in the wheel and can unset itself directly. For many uses timeout handling can be O(1) as well, if your timer handling code runs from a hardware timer tick then each tick moves the wheel forward by one slot and expires the timers that are present there. My usage is a little more complicated in that I need to be able to query the wheel for the time when the next timer is due. This means that I need to look up when the next timer is due and to do that I need to scan the wheel from 'now' forward until I find a timer... Our worst case is O(n slots) where the number of slots is determined by the maximum supported timeout and the timer granularity.</p>

<p><img alt="TimerWheel-1.png" src="http://www.lenholgate.com/blog/images/TimerWheel-1.png" width="434" height="301" border="0" /></p>

<p>In the diagram above, if the timer wheel's current time is 0 then we would need to scan forward sequentially from 'start' to the first timer that is set at position 30 to determine that the first timer is set at 30...</p>

<p>It's possible to optimise this. We could manage a 'hint' which points to the earliest timer that has been set, discovering the next timeout could then be O(1) if the hint is set. The hint could be managed by our calls to <code>SetTimer()</code>, if the timer we're setting is earlier than the hint, or if it's the first timer set then we set the hint to point at it. Unfortunately this scheme falls down in the presence of timer cancellation. If you cancel the earliest timer then you need to scan forward to update the hint, this forward scan is potentially O(n); so now your cancellation has gone from O(1) to O(n) to keep your timeout processing at O(1)... In some usage scenarios this might be acceptable except that, of course, expiring a timer or setting a timer that is already set are also forms of cancellation...</p>

<p>For now we'll avoid all of this complexity and settle for O(n) next timeout calculation. We will, however, mitigate the worst case and add a counter that counts how many timers are currently set, if the counter is zero then there's no need to scan the whole array to discover that no timer is set; our worst case is now that only one timer is set and it's set to the maximum timeout value. Likewise we can keep a hint that can be passed from one call to <code>GetNextTimeout()</code> to the next as long as the hint is zeroed upon any timer changes.</p>
<pre class="brush: cpp gutter: false">Milliseconds CCallbackTimerWheel::GetNextTimeout()
{
   Milliseconds nextTimeout = INFINITE;
  
   // We need to work out the time difference between now and the first timer that is set. 
  
   if (!m_pFirstTimerSetHint)
   {
      m_pFirstTimerSetHint = GetFirstTimerSet(); 
   }
  
   if (m_pFirstTimerSetHint)
   {
      // A timer is set! Calculate the timeout in ms
  
      nextTimeout = static_cast&lt;milliseconds&gt;((
         (m_pFirstTimerSetHint &gt; m_pNow ? 
            (m_pFirstTimerSetHint - m_pNow) : 
            (m_pNow - m_pFirstTimerSetHint)) + 1) * m_timerGranularity);
  
      const Milliseconds now = m_tickCountProvider.GetTickCount();
  
      if (now != m_currentTime)
      {
         // Time has moved on, adjust the next timeout to take into account the difference between now and 
         // the timer wheel's view of the current time...
  
         const Milliseconds timeDiff = (now &gt; m_currentTime ? now - m_currentTime : m_currentTime - now);
  
         if (timeDiff &gt; nextTimeout)
         {
            nextTimeout = 0;
         }
         else
         {
            nextTimeout -= timeDiff;
         }
      }
   }
  
   return nextTimeout;
}
  
CCallbackTimerWheel::TimerData **CCallbackTimerWheel::GetFirstTimerSet() const
{
   TimerData **pFirstTimer = 0;
  
   if (m_numTimersSet != 0)
   {
      // Scan forwards from now to the end of the array...
  
      for (TimerData **p = m_pNow; !pFirstTimer &amp;&amp; p &lt; m_pTimersEnd; ++p)
      {
         if (*p)
         {
            pFirstTimer = p;
         }
      }
  
      if (!pFirstTimer)
      {
         // We havent yet found our first timer, now scan from the start of the array to 
         // now...
  
         for (TimerData **p = m_pTimersStart; !pFirstTimer &amp;&amp; p &lt; m_pNow; ++p)
         {
            if (*p)
            {
               pFirstTimer = p;
            }
         }
      }
  
      if (!pFirstTimer)
      {
         throw CException(_T("CCallbackTimerWheel::GetFirstTimerSet()"),
            _T("Unexpected, no timer set but count = ") +
            ToString(m_numTimersSet));
      }
   }
  
   return pFirstTimer;
}
</pre>

<p>Now that we can work out when the next timeout is due we can start to think about handling the timers when they expire. Given the diagram below, if the timer wheel believes that the current time is as indicated and the timers are then expired when the time is at 'now' we will need to process all of the the timers that are set in the order shown by their index numbers.</p>

<p><img alt="TimerWheel-3.png" src="http://www.lenholgate.com/blog/images/TimerWheel-3.png" width="434" height="301" border="0" /></p>

<p>Again, if we were driving the wheel from a timer tick then things are simplified as we would only ever 'rotate' the wheel by one slot at a time. In the world of general purpose, multi-threaded, non-real time systems though (is that a big enough proviso?) all manner of reasons might mean that we don't actually get to process the timers until after they're due.</p>

<p>If all we need to worry about is processing the timers in sequence then we could step along the wheel and then walk each chain of timers and handle them as we go. It could be a little more complex than that if we want to use the <code>BeginTimeoutHandling()</code>, <code>HandleTimeouts()</code>, <code>EndTimeoutHandling()</code> methods to allow us to process timers without holding our lock onto the timer system whilst the timers are dispatched (I talk about why this is a desirable design <a href="http://www.lenholgate.com/archives/000795.html">here</a>, and why needing to go through the begin, handle, end sequence multiple times to process timers is less than ideal <a href="http://www.lenholgate.com/archives/000907.html">here</a>). Ideally, for the later situation we'd want our 'begin' to accumulate all 6 timers into a correctly ordered list and remove them from the wheel. We would then unlock the wheel and process the 6 timers in order before locking the wheel again to update the processed timers inside of <code>EndTimeoutHandling()</code>. Doing it this way would mean traversing each slot's list of timers to get to the last one so that we can link the next list onto the end of the previous lists...</p>

<p>If we ignore the more complex scenario and implement the easy one we end up with code like this to deal with the 'holding a lock whilst dispatching' case.</p>
<pre class="brush: cpp gutter: false">void CCallbackTimerWheel::HandleTimeouts()
{
   const Milliseconds now = m_tickCountProvider.GetTickCount();
  
   while (TimerData *pTimers = GetTimersToProcess(now))
   {
      while (pTimers)
      {
         pTimers = pTimers-&gt;OnTimer();
  
         --m_numTimersSet;
      }
   }
}
  
CCallbackTimerWheel::TimerData *CCallbackTimerWheel::GetTimersToProcess(
   const Milliseconds now)
{
   TimerData *pTimers = 0;
  
   // Round 'now' down to the timer granularity
  
   const Milliseconds thisTime = ((now / m_timerGranularity) * m_timerGranularity);
  
   while (!pTimers &amp;&amp; m_currentTime != thisTime)
   {
      TimerData **ppTimers = GetTimerAtOffset(0);
  
      pTimers = *ppTimers;
  
      // Step along the wheel...
  
      m_pNow++;
  
      if (m_pNow &gt;= m_pTimersEnd)
      {
         m_pNow = m_pTimersStart + (m_pNow - m_pTimersEnd);
      }
  
      m_currentTime += m_timerGranularity;
   }
  
   if (pTimers)
   {
      m_pFirstTimerSetHint = 0;
   }
  
   return pTimers;
}
</pre>
<p>With this in place we're left with 20 tests that fail due to lack of implementation and 4 functions that we need to deal with properly. Three form the begin, handle, end API for unlocked timer dispatch and the fourth is for the the <code>SetTimer()</code> overload that doesn't require a handle. There's an interesting amount of functionality required to implement the remaining functions as you can see from <a href="http://www.lenholgate.com/archives/000795.html">here</a> and <a href="http://www.lenholgate.com/archives/000803.html">here</a>. We'll look at this next time.</p>

The code can be found <a href="http://www.lenholgate.com/zips/PracticalTesting-25.zip" onclick="_gaq.push(['_trackEvent', 'Downloads', 'PracticalTesting-25.zip']);">here</a> and the <a href="http://www.lenholgate.com/archives/000906.html">previous rules</a> apply.]]>
    </content>
</entry>

<entry>
    <title>Practical Testing: 23 - Another new approach: timer wheels</title>
    <link rel="alternate" type="text/html" href="http://www.lenholgate.com/blog/2010/07/practical-testing-23---another-new-approach-timer-wheels.html" />
    <id>tag:www.socketframework.com,2010:/blog//12.961</id>

    <published>2010-07-19T08:01:18Z</published>
    <updated>2011-01-02T13:39:24Z</updated>

    <summary>The most recent articles in the &quot;Practical Testing&quot; series have been discussing the performance of the timer queue that we have built. As I hinted when I first brought up the performance issues, the particular use case that I have...</summary>
    <author>
        <name>Len</name>
        
    </author>
    
        <category term="ENet" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Source Code" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Testing" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en-us" xml:base="http://www.lenholgate.com/blog/">
        <![CDATA[<p>The most recent articles in the <a href="http://www.lenholgate.com/archives/000306.html">"Practical Testing"</a> series have been discussing the performance of the timer queue that we have built. As I hinted when I <a href="http://www.lenholgate.com/archives/000906.html">first brought up</a> the performance issues, the particular use case that I have which is causing problems might well be more efficiently dealt with using a different (more specialised and less general purpose) design. </p>

<p>The timer queue has adequate performance for general purpose use and can handle timers set within a range of 0ms to 49.7 days with a theoretical granularity of 1ms. It achieves this by using a balanced search tree to store the timers by absolute timeout. The performance of setting a timer is O(log n) due to the tree insertion required. Cancelling a timer is O(1) since we keep an iterator to where the timer was inserted and thus can navigate straight to it to cancel it. Timer expiry is also an O(log n) operation due to the tree lookup. Due to the use of the standard program heap the worst case contention of the queue is <b>C(tq)+(C(tn-tq+1)+C(ts-tq+1)+C(tn-tq+1))</b> (see <a href="http://www.lenholgate.com/archives/000908.html">here</a> for details of my crazy Big C notation for describing contention).</p>

The more specialist use case is for driving reliable UDP protocols. This kind of work generally requires timers per connection for retransmission and data flow pacing. The timeouts tend to be short and the timers tend to expire rather than be reset without expiring. The range of timeouts is generally quite small; 0ms - 30seconds for the ENet system I'm working on. I'm currently looking at improving performance of the timer system for this kind of scenario and to do so requires that timer insertion speed be improved (so we can set timers more quickly), timer expiry speed be improved (so we can process timers faster) and contention be reduced, ideally tending towards <b>C(tq)</b> where we have contention only between users of the timer queue and not between any thread in the process.]]>
        <![CDATA[<p>As I have already mentioned the use of STL containers means that I'm doing more work than is strictly necessary when manipulating the timer queue (including dynamic memory allocation and release during timer insertion and removal). One way of improving contention is to switch to using custom STL allocators so that only the users of the queue ever access the allocators that we use for the queue. Another is to write a custom, invasive, balanced search tree that does not need to use dynamic allocation.</p>

<p>A third solution would be to use a simpler data structure. Our requirement is simply to store timers in order of timeout. Rather than using a complex tree structure we could use a simple sorted list. Unfortunately timer insertion would then rise to O(n) as we would need to traverse the list to locate the correct spot to insert our new timer. Cancellation can stay O(1) if we use our invasive <code>CNodeList</code> and timer handling becomes O(1) because we will always work from the head of the list when expiring timers. The usage pattern of the reliable retransmission means that we'll be inserting timers over the whole of our possible range, so the O(n) insertion would really bite us. </p>

<p>In a classic trade off between memory usage and performance we could use an array and have lots of wasted space in it. Setting a timer becomes O(1), you simply index directly into the array at the correct location. Cancellation and timer processing are also O(1) and there's no dynamic memory allocation required for insertion and removal so the worst case contention is C(tq). Such a structure is called a timer wheel due to the fact that the array is viewed as a circular buffer and timers are inserted with timeouts relative to a 'now' point on the wheel. </p>

<p>The amount of memory used can be reduced by reducing the granularity at which you can set your timers. For example, a timer wheel with a range of 0-30seconds and a granularity of 1ms requires 30,000 elements in the array, if you reduce the granularity to 15ms (which is pretty much the best you can get from <code>GetTickCount()</code> anyway), then the array size is reduced to a more manageable 2,000 elements. Given that the array is an array of pointers we're looking at 8kB on an x86 and 16kB on x64. Each array element points to either <code>null</code> if no timer is set or to the first timer in a doubly linked list of timers at this time. The list is invasive with the links being part of the data that is stored in the list. Insertion into the list is a case of simply pushing a new node onto the front of the existing list, cancellation is easy as the list is doubly linked and the node contains the links. Thus most timer manipulation becomes simply adjusting pointers.</p>

<p><img alt="TimerWheel-1.png" src="http://www.lenholgate.com/blog/images/TimerWheel-1.png" width="434" height="301" border="0" /></p>

<p>The wheel in the diagram above has a granularity of 5ms and has timers set at 30 and 50. The wheel is defined by two pointers, one to the start of it and one to one element beyond the end.</p>

<p><img alt="TimerWheel-2.png" src="http://www.lenholgate.com/blog/images/TimerWheel-2.png" width="434" height="301" border="0" /></p>

<p>This diagram clearly shows the circular nature of the array. This is just before we expire the 30ms timer. Note that the next timer is due in 20ms.</p>

<p>My implementation of a timer wheel is made easier by the fact that I have a set of tests that target the interface to which I wish to conform to. To start with I'll implement a basic timer wheel that allows us to create, set and cancel timers but that doesn't deal with any of the complexity of expiring timers. Also all of the nice and implied or explicit implementation details will be left out. Don't worry, once we write the tests for these pieces of functionality it'll be obvious where we're failing.</p>

<p>Creation and destruction of the timer wheel are pretty straight forward. We have an array of pointers to create, the size of which is based on the maximum timeout that we can set and the granularity of the timers that can be set. Destruction is similar to the timer queue in that we iterate any existing timers and clean them up. Timer creation is very similar to our timer queue as we dynamically allocate the timer data and insert it into a map for validation and clean up purposes. The timers themselves are, at present at least, quite simple. a link for the next timer in the list, a link to the previous timer and the timer and user data. Setting a timer simply involves validating it, locating the correct index into the timer wheel array and then adding the timer to the list of timers at that point in the array.</p>
<pre class="brush: cpp gutter: false">bool CCallbackTimerWheel::SetTimer(
   const Handle &amp;handle,
   Timer &amp;timer,
   const Milliseconds timeout,
   const UserData userData)
{
   if (timeout &gt; m_maximumTimeout)
   {
      throw CException(
         _T("CCallbackTimerWheel::SetTimer()"), 
         _T("Timeout is too long. Max is: ") + ToString(m_maximumTimeout) + _T(" tried to set: ") + ToString(timeout));
   }
  
   TimerData &amp;data = ValidateHandle(handle);
  
   const bool wasSet = data.CancelTimer();
  
   data.UpdateData(timer, userData);
  
   InsertTimer(timeout, data);
  
   return wasSet;
}
  
void CCallbackTimerWheel::InsertTimer(
   const Milliseconds timeout,
   TimerData &amp;data)
{
   const size_t timerOffset = timeout / m_timerGranularity;
  
   TimerData **ppTimer = GetTimerAtOffset(timerOffset);
  
   data.SetTimer(ppTimer, *ppTimer);
}
  
void CCallbackTimerWheel::TimerData::SetTimer(
   TimerData **ppPrevious,
   TimerData *pNext)
{
   if (m_ppPrevious)
   {
      throw CException(
         _T("CCallbackTimerWheel::TimerData::SetTimer()"),
         _T("Internal Error: Timer is already set"));
   }
  
   m_ppPrevious = ppPrevious;
  
   m_pNext = pNext;
  
   if (m_pNext)
   {
      m_pNext-&gt;m_ppPrevious = &amp;m_pNext;
   }
  
   *ppPrevious = this;
}
</pre>

<p>I'm using a pointer to the previous pointer rather than a pointer to the previous node as it makes things slightly simpler; honest...</p>

<p>With just enough code to get the first set of tests to run I have enough to get some initial performance figures out of the new timer system. Timer creation is about the same as with the queue, but that's expected as the code is almost identical; the contention for creation and destruction are also the same as for the queue and thus could also be improved with custom allocators and private heaps. The performance tests for <code>SetTimer()</code> show a dramatic improvement. On my test machine I get figures of around 4ms to set a single timer 100,000 times against 90ms for the queue and similar improvements in the other two performance tests for <code>SetTimer()</code>. What's even better is that <code>SetTimer()</code> would have a contention of <b>C(t-queue)</b> as we no longer have to do any of the dynamic allocation that was going on with the timer queue's STL manipulation.</p>

<p>Right now we're left with a failing test which points the way for what we need to do next which is deal with being able to process these timers when they time out, but before I look at that I think it's about time that I take a good hard look at the duplication in the tests. We're testing an interface with three implementations and we should have a single set of tests which does that and then have some implementation specific tests as well if we feel we need them. Having one set of duplicate test code for the Ex version of the queue was wrong but I could just about live with it, having another duplicate set for the timer wheel is just something I'm not prepared to put up with unless it's simply not possible to remove the duplication. </p>

The code can be found <a href="http://www.lenholgate.com/zips/PracticalTesting-23.zip" onclick="_gaq.push(['_trackEvent', 'Downloads', 'PracticalTesting-23.zip']);">here</a> and the <a href="http://www.lenholgate.com/archives/000906.html">previous rules</a> apply.]]>
    </content>
</entry>

<entry>
    <title>Practical Testing: 22 - Performance: Some you win...</title>
    <link rel="alternate" type="text/html" href="http://www.lenholgate.com/blog/2010/07/practical-testing-22---performance-some-you-win.html" />
    <id>tag:www.socketframework.com,2010:/blog//12.959</id>

    <published>2010-07-15T20:29:23Z</published>
    <updated>2011-01-02T13:37:25Z</updated>

    <summary>The previous article in the &quot;Practical Testing&quot; series set things up so that we can measure the performance of the code under test with the intention of trying to improve performance for a specific set of use case scenarios. This...</summary>
    <author>
        <name>Len</name>
        
    </author>
    
        <category term="Source Code" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Testing" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en-us" xml:base="http://www.lenholgate.com/blog/">
        <![CDATA[<p>The <a href="http://www.lenholgate.com/archives/000906.html">previous article</a> in the <a href="http://www.lenholgate.com/archives/000306.html">"Practical Testing"</a> series set things up so that we can measure the performance of the code under test with the intention of trying to improve performance for a specific set of use case scenarios. This time around I'll make a few changes which I hope will improve the performance and measure the effects of the changes with the performance tests that I added last time.</p>

One of the things that struck me about the code when I was looking at it during my profiling runs for my client is that when using the <code>CThreadedCallbackTimerQueue</code> in "do not hold your lock whilst dispatching timers" mode, the lock is acquired and released for each timer that is dispatched. This is less than optimal as the timer dispatch code is designed to dispatch all of the timers that are currently due and so it could, theoretically, extract all of them when <code>CCallbackTimerQueueBase::BeginTimeoutHandling()</code> is called and process all of them when <code>CCallbackTimerQueueBase::HandleTimeout()</code> is called. Instead it extracts them one at a time. Part of the reason for this is that a) the Begin/End style of timeout handling was a late addition to the design and b) it's always worked well enough in the past so why change it.]]>
        <![CDATA[<p>At the moment I use a <code>std::multimap</code> for storing timers. This indexes the timers based on absolute timeout and the use of a <code>std::multimap</code> allows me to have multiple timers set for the same time. It also allows me to retain an iterator to the newly inserted timer so that timer cancellation is an O(1) operation. Insertion and timeout expiry processing are O(log n) due to the balanced binary tree lookups involved. To improve timeout processing speed I'd like to be able to do a single lookup and return all of the timers with that timeout rather than one lookup per timer, though theoretically I could adjust the interface and expose a <code>GetNextTimeout()</code> function which could use the current timer's iterator as a hint to locate the next one, avoiding subsequent lookups, that would still leave us with the fine grained locking issue where callers only process a single timer with each Begin/Hanlde/End sequence. Since <code>std::multimap</code> doesn't support removal of all the values at a given key whilst still leaving them in some form of container so that we can manipulate them as one I would need to switch from using <code>std::multimap</code> to using a <code>std::map</code> where the values are a <code>std::set</code> or <code>std::deque</code> of timers at the timeout value. The <code>std::deque</code> implementation is fractionally more complex as with the <code>std::set</code> we can use the insertion iterator to locate our node for cancellation whereas with a <code>std::deque</code> we need to use the offset at which we inserted the timer and then set that element to <code>null</code> when we cancel the timer. Knowing when we can delete the <code>std::deque</code> if all timers for that timeout are cancelled also required a book-keeping counter. The complexity of the <code>std::deque</code> code demanded a couple of new tests...</p>

<p>The code for the <code>std::set</code> based implementation can be found <a href="http://www.lenholgate.com/zips/PracticalTesting-22a.zip" onclick="_gaq.push(['_trackEvent', 'Downloads', 'PracticalTesting-22a.zip']);">here</a>.</p>

<p>The code for the <code>std::deque</code> based implementation can be found <a href="http://www.lenholgate.com/zips/PracticalTesting-22b.zip" onclick="_gaq.push(['_trackEvent', 'Downloads', 'PracticalTesting-22b.zip']);">here</a>.</p>

<p>The same <a href="http://www.lenholgate.com/archives/000906.html">rules</a> as last time apply.</p>

<p>As expected the time taken to handle the timeouts drops but the time taken to set and cancel a timer has increased. We're doing more work when setting timers because we have to allocate space in the new structure as well as in the map that we were using before. Cancellation likewise takes longer in some circumstances due to the corresponding deallocation.</p>

<p>The increased performance for handling timeouts may well be worth the performance degradation in setting and cancelling timers in some situations. These tests don't show what we're doing about cross-thread lock contention though. These changes have made that slightly worse. </p>

<p>Lets use a new notation to discuss the contention issues, I'll call it "Big C notation" ;). For a given operation that accesses a shared lock we have a worst case contention value of <b>C</b>. In a piece of code where a single thread performs an operation involving a single lock but where no other threads can ever perform the operation then the operation can be classified as C(0); no contention. If we have a single shared lock then operations involving the lock are C(n) where n is the number of threads that can perform the operation. For operations that access multiple locks their overall contention value can either be represented as the highest value of C for each of the operations involving the shared locks, or as the sum of the contention; so an operation involving two locks which are C(2) and C(5) could either be C(5) or C(2)+C(5). I think I prefer the later as it shows the number of times that the contention factor can bite as an operation that is C(2)+3C(5)+C(1) is obviously more contention prone whereas representing it as simply C(5) is somewhat misleading...</p>

<p>In our timer queue the queue operations themselves are C(n1) where n1 is the number of threads that access the queue. Queue operations that involve dynamic memory allocation or release are C(n2) where n2 is the number of threads that access the heap used for allocation. Thus any operation on the queue that involves dynamic memory allocation degrades from C(n1) to C(n1)+C(n2) where n2 is always assumed to be at least equal to n1 but probably higher. The latest changes to add the <code>std::set</code> or <code>std::deque</code> make this worse by converting the contention value to C(n1)+2C(n2) as we need to access the STL's allocator twice...</p>

<p>From this it's quite clear that to reduce contention we should have a private heap that is only used by a specific instance of a timer queue; this would mean that n2 is always the same as n1. Whilst providing a private heap and custom allocators for the STL types would be a worthwhile exercise I can't help thinking that, at this point, there's a better way to solve the actual problem that I'm facing with regards to the specific use case of the timer queue.</p>

<p>There really should be no need to do any dynamic memory allocation at all apart from to create and destroy the timer handle itself and even that could be avoided if we take a similar approach to the Win32 Critical Section API. Using the STL is convenient but the fact that the containers are non-invasive (that is you don't have to change a type or have it derive from a specific type to be able to place it into a container) means that book-keeping data needs to be allocated outside of the type that's being stored to provide, for example, the space for the pointers that connect nodes in a <code>std::map</code> together. I've already implemented an invasive container in the form of <code><a href="http://www.serverframework.com/ServerFramework/latest/Docs//class_jet_byte_tools_1_1_win32_1_1_c_node_list.html">CNodeList</a></code> which is used for buffer and socket allocation tracking and the advantage of an invasive container is that the object that's being stored in the container can include space for the data that is needed to store it in the container. This is especially beneficial when you're inserting and removing the same object time and again as the data is only ever allocated and released once as part of the object that you wish to store. I'm actually quite surprised that the STL didn't also include invasive containers as the non-invasive containers could have been implemented in terms of the invasive ones.</p>

<p>Anyway, in an ideal world, with a custom invasive container, the timer handle could include links for the tree nodes of the tree that it's stored in for searching and links to the other timers at this timeout (the equivalent of the <code>std::set</code> or <code>std::deque</code> in the latest timer queue). Since the timer handle itself would contain space for these links there would be no need for allocation or deallocation when setting, cancelling or expiring timers. This would remove all of the potential allocation contention and leave us with operations which were C(n) where n is the number of threads that can access the timer queue... The insertion and timeout handling would still be O(log n) due to the balanced binary tree lookups required but a) the potential contention would be lower and b) O would be much smaller as in general all you're doing once you find the place in the tree is adjusting the linking pointers.</p>

This change would mean replacing the STL containers with a custom balanced search tree implementing a multi-map where you can remove all items at a specific key with a single operation and where you're left with a list of timers at that key. That's a fairly major change but the performance improvements are likely to be pretty good. However, there's a simpler data structure that might work even better for my specific use case scenario. We'll look at that next time.]]>
    </content>
</entry>

<entry>
    <title>Practical Testing: 21 - Looking at Performance and finding a leak</title>
    <link rel="alternate" type="text/html" href="http://www.lenholgate.com/blog/2010/07/practical-testing-21---looking-at-performance-and-finding-a-leak.html" />
    <id>tag:www.socketframework.com,2010:/blog//12.958</id>

    <published>2010-07-15T10:43:37Z</published>
    <updated>2011-01-02T13:36:56Z</updated>

    <summary>Back in 2004, I wrote a series of articles called &quot;Practical Testing&quot; where I took a piece of complicated multi-threaded code and wrote tests for it. I then rebuild the code from scratch in a test driven development style to...</summary>
    <author>
        <name>Len</name>
        
    </author>
    
        <category term="ENet" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Source Code" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Testing" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en-us" xml:base="http://www.lenholgate.com/blog/">
        <![CDATA[<p>Back in 2004, I wrote a series of articles called <a href="http://www.lenholgate.com/archives/000306.html">"Practical Testing"</a> where I took a piece of complicated multi-threaded code and wrote tests for it. I then rebuild the code from scratch in a test driven development style to show how writing your tests before your code changes how you design your code. Since the original articles there have been several bug fixes and redesigns all of which have been supported by the original unit tests and many of which have led to the development of more tests. </p>

<p>The tests were written for the <a href="http://www.lenholgate.com/archives/000843.html">pretty scalable timer queue implementation</a> in my <a href="http://www.serverframework.com/ServerFramework/latest/Docs//sockettoolslicensing.html">high performance, scalable, server framework</a>. The timer queue has been present in the framework ever since I first had to associate timeouts with overlapped I/O requests back on NT 4.0. Back then the <a href="http://msdn.microsoft.com/en-us/library/ms682483(VS.85).aspx">Windows Timer Queue</a> didn't exist and so I rolled my own. As you'll see from the previous entries in this series, the class has gone through some changes over time. The queue works well for dealing with all manner of varied timeouts and is pretty general purpose in nature; I use it for setting timeouts to roll log files to a new name in my rotating log file class, I use it for all manner of per connection based time situations and it scales well; there's no problem having 30,000 connections all with a 2 minute inactivity timeout set, etc.</p>

<p>The problem with it is that it could perform better. The general purpose nature of the queue means that it needs to be flexible; it can handle timeouts from 1ms to 49.7 days in a reasonably time and space efficient manner. I use STL containers to index the timers and that works pretty well. As a general purpose solution it's pretty good. The problem is that it's not good enough, at least it's not fast enough for situations where I'm using the timers to implement retransmission timers for reliable UDP protocols. In these situations I have lots of timers (one per connection, and the aim is to support lots of concurrent connections) for generally short periods of time, 50ms to 30seconds. Since the timers are used for retransmission and data flow pacing they tend to expire (rather than being reset and rarely expiring as is the case with inactivity timers), they also tend to set a new timer when the current one expires. I've found that the general purpose nature of the timer queue and the use of STL means that there's a lot of contention going on for the timer system and that contention affects performance.</p>

<p>The <a href="http://www.lenholgate.com/archives/000381.html">thread safe version</a> of the timer queue protects the internal data structures with a lock and this lock needs to be acquired to do anything to the queue. So we lock when we create a new timer, we lock when we set a timer, we lock when the timer thread is processing an expired timer, etc. Obviously we try and hold this lock for as short a time possible and obviously we try and access the queue from as few threads as possible but eventually the contention starts to bite as connections reset their timers and timers expire almost constantly to drive rate limited flow queues and retransmission, etc. Although in this particular scenario the timer expiry code is quick (we lock, remove the expired timer and the callback simply pushes the user data into a queue of work items that are processed by another thread, we then release the lock) the code still causes the lock to be acquired and released for each timer that expires. In the reliable UDP scenario we have lots of timers set for exactly the same time and so the thread that processes the expired timers spends a lot of time acquiring and releasing the lock on the queue. Since the expiry of a timer often causes a new timer to be set we then have other threads processing the work items and setting new timers which also requires that we lock the queue. So all threads that access the queue are in contention for the lock; there's not much that we can do about that apart from reducing the amount of time spent holding the lock. </p>

<p>Unfortunately it's worse than that. Since we're using the STL to implement our timer queue and since setting, expiring and cancelling a timer all result in adjustments being made to our central STL map object the threads that use our timer queue are also in contention with all other threads in the system that use the same memory heap as our timer queue uses. Each operation results in dynamic memory allocation and/or release.  </p>

My super secret game company client need their <a href="http://enet.bespin.org/">ENet</a> implementation to run fast and to support 1000s of concurrent connections. In profiling their system under load the timer queue is one of the hot spots and the locks that it uses are showing some of the most contention in the system. Because of this I need to take a look at what we're doing and potentially move from a general purpose solution to something a little more specific.]]>
        <![CDATA[<p>My first thoughts were to add some explicit monitoring to the timer queue, <a href="http://www.lenholgate.com/archives/000903.html">performance counters</a> are your friends, and so a monitoring interface was born and the threaded callback timer queue was adjusted so that it could give an idea of the contention for its lock by using <a href="http://msdn.microsoft.com/en-us/library/ms686857(VS.85).aspx">TryEnterCriticalSection()</a> so that we could track contention (and follow such failures to acquire with a plain ol' <a href="http://msdn.microsoft.com/en-us/library/ms682608(v=VS.85).aspx">EnterCriticalSection()</a>. The code looks something like this and whilst the code changes will, no doubt, change the way the contention bites it gives some indication of the contention being experienced.</p>
<pre class="brush: cpp gutter: false">void CThreadedCallbackTimerQueue::SetTimer(
   Timer &amp;timer,
   const Milliseconds timeout,
   const UserData userData)
{
#if (JETBYTE_PERF_TIMER_QUEUE_MONITORING_DISABLED == 0)
  
   ICriticalSection::PotentialOwner lock(m_criticalSection);
  
   if (!lock.TryEnter())
   {
      m_monitor.OnTimerProcessingContention(IMonitorThreadedCallbackTimerQueue::SetOneOffTimerContention);
  
      lock.Enter();
   }
  
#else 
  
   ICriticalSection::Owner lock(m_criticalSection);
  
#endif
  
   m_spTimerQueue-&gt;SetTimer(timer, timeout, userData);
  
   SignalStateChange();
}
</pre>
<p>The next step was to add some performance tests to the timer queue's test suite. Simple things such as measuring the time taken to set timers, cancel them, create them and expire them; obviously we repeat the operation a large number of times and repeat the whole test several times and take the average result. This gives us some figures for the current implementation, the performance tests for general performance improvement in the algorithm and the contention figures for some indication of whether the performance improvements are actually helping in the real-world usage scenario.</p>

<p>Whilst writing the monitoring test I decided that there was a leak in the timer queue for "one shot" timers that were active when the timer queue was destroyed. I added a new monitoring function to track when the internal timer data was actually deleted as this isn't quite the same as when <code>DestroyTimer()</code> is called and anyway <code>DestroyTimer()</code> is never called for "one shot" timers. The resulting trace from the mock monitor proved that the leak that I had thought I had found didn't exist. Since the monitor, with the new deletion monitoring functionality could prove that all timer data was cleaned up correctly I decided to add the monitor to all tests and added a simple call to check that the number of calls to <code>OnTimerCreated()</code> equalled the number of calls to <code>OnTimerDeleted()</code>. This addition to the tests located a memory leak in the code that dealt with destroying a timer handle during timeout processing. I was calling <code>DeleteAfterTimeout()</code> in <code>CCallbackTimerQueueBase::DestroyTimer()</code> rather than calling <code>SetDeleteAfterTimeout()</code>. What's interesting to me is that this leak wasn't located by any of the tests even when running the tests under a leak checking tool such as <a href="http://en.wikipedia.org/wiki/BoundsChecker">BoundsChecker</a>. The reason that BoundsChecker failed to report on it is that it couldn't handle the concept of casting dynamically allocated memory to an opaque handle and storing the handle in a map for later clean up; I guess I can't blame it really... Since it would always complain that the data that is allocated in <code>CCallbackTimerQueueBase::CreateTimer()</code> was leaked even though it was cleaned up correctly later on I added an entry  to the BoundsChecker suppression file; although most of these complaints are invalid, this one WAS valid and the data WAS being leaked. Once again I'm reminded that it doesn't matter how many tests you have, how much coverage you have and even what tools you use the weak link is <i>always</i> the human being in the process... I've changed how the handle map works so that it stores the typed data rather than the opaque handle. This means that BoundsChecker CAN follow the ownership correctly and can now report on the real leak and stay quiet about the things that it thought we leaks but that weren't. I then fixed the actual bug... In summary, don't suppress the warning, understand it and change the code to fix it.</p>

The code is <a href="http://www.lenholgate.com/zips/PracticalTesting-21.zip" onclick="_gaq.push(['_trackEvent', 'Downloads', 'PracticalTesting-21.zip']);">here</a> and new rules apply. A fair while has passed since the <a href="http://www.lenholgate.com/archives/000803.html">previous episode</a> in this series of articles. My build environment, and some of the support code has changed a fair bit since then. The code will build with VS.Net 2002, VS.Net 2003, VS 2005, VS 2008 and VS 2010. The code builds as x86 or x64 with VS 2005, 2008 and 2010. Win32Tools is the workspace that you want and Win32ToolsTest is the project that you should set as active. The code will build with either the standard STL that comes with Visual Studio or with a version of STL Port. The code uses precompiled headers <a href="http://www.lenholgate.com/archives/000345.html">the right way</a> so that you can build with precompiled headers for speed or build without them to ensure minimal code coupling. The various options are all controlled from the "Admin" project; edit <code>Config.h</code> and <code>TargetWindowsVersion.h</code> to change things... By default the project is set to build for Windows 7; this will mean that the code WILL NOT RUN on operating systems earlier than Windows Vista as it will try and use <code>GetTickCount64()</code> which isn't available. To fix this you need to edit the <code>Admin\TargetWindowsVersion.h</code> file and change the values used; see <code>Admin\TargetWindowsVersion_WIN2K.h</code> and <code>Admin\TargetWindowsVersion_WINXP.h</code> for details. Since I'm looking at the performance of the code I've adjusted the VS2008 solutions to build without checked iterators in release mode, by default the STL in VS2008 builds with checked iterators enabled in release mode as well as in debug mode and this adversely affects performance; see <a href="http://msdn.microsoft.com/en-us/library/aa985965.aspx">here</a> for more details. Although the code builds with earlier versions of Visual Studio I'm not actively using these versions and so there may be standard VS STL related performance tweaks that could be applied. I'm only focussing on VS2008 and VS2010 with my testing. This code is in the Public Domain.]]>
    </content>
</entry>

<entry>
    <title>Practical Testing: 20 - Mind the gap</title>
    <link rel="alternate" type="text/html" href="http://www.lenholgate.com/blog/2008/08/practical-testing-20---mind-the-gap.html" />
    <id>tag:www.socketframework.com,2008:/blog//12.856</id>

    <published>2008-08-12T11:44:19Z</published>
    <updated>2010-12-29T09:56:43Z</updated>

    <summary>Back in 2004, I wrote a series of articles called &quot;Practical Testing&quot; where I took a piece of complicated multi-threaded code and wrote tests for it. I then rebuild the code from scratch in a test driven development style to...</summary>
    <author>
        <name>Len</name>
        
    </author>
    
        <category term="Source Code" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Testing" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en-us" xml:base="http://www.lenholgate.com/blog/">
        <![CDATA[<p>Back in 2004, I wrote a series of articles called "<a href="http://www.lenholgate.com/archives/000306.html">Practical Testing</a>" where I took a piece of complicated multi-threaded code and wrote tests for it. I then rebuild the code from scratch in a test driven development style to show how writing your tests before your code changes how you design your code. Since the original articles there have been several bug fixes and redesigns all of which have been supported by the original unit tests and many of which have led to the development of more tests. </p>

<p>But... you need to be aware of the tests that you haven't written... You may think you're testing the code completely but often you're not testing as well as you might think. Code coverage metrics don't really help either, no matter how many tests you write and how complete your coverage is there's no way to be sure that you're actually covering everything. 100% code coverage might give you a warm feeling but unless you have 100% concurrent code permutation coverage then that warm feeling is misplaced. </p>

<p>In <a href="http://www.lenholgate.com/archives/000795.html">part 18</a> I redesigned the timer queue to allow for lock-free timer dispatch; that is we no longer hold an internal lock when calling back into user code. This removed the potential for deadlocking user code due to unexpected lock inversions but complicated the timer queue implementation due to the fact that the internals of the queue needed to remain sane when timer manipulation functions interrupted the unlocked portion of the timer dispatch code.</p>

I added some tests to prove that these changes worked, unfortunately there's a race condition in the code and the tests fail to cause the race condition to occur... Were we to examine the tests we'd see that we were covering all of the code but not all of the permutations.]]>
        <![CDATA[<p>In the internal timer data's <code>HandleTimeout()</code> call we do this:</p>
<pre class="brush: cpp gutter: false">void CCallbackTimerQueueBase::TimerData::HandleTimeout()
{
   OnTimer(m_timedout);
  
   m_timedout.Clear();
}
</pre>
<p>This is where we call into user code, so this happens whilst the queue is unlocked and it must be safe to allow other timer manipulation calls to occur on the timer queue whilst this code executes. Unfortunately <code>DestroyTimer()</code> uses the fact that the <code>m_timedout</code> data is valid to decide whether or not to actually delete the timer data during the call to <code>DestoryTimer()</code> or whether to simply mark the timer data to be deleted when the timer handling is complete. Since we unset the data that <code>DestroyTimer()</code> is using to make this decision whilst we're unlocked there's a race condition. If we unset <code>m_timedout</code> before a call to <code>DestroyTimer()</code> checks the data for this timer then <code>DestroyTimer()</code> will think that the timer is not currently in the process of being timed out and will delete the timer data. When the timeout handling completes we may also delete the data or, worse, when we try to access the data it will already have been deleted.</p>

<p>Our existing tests for this kind of interruption test to see that it's OK to call the various timer manipulation functions during timer dispatch and cause the interruption to occur between two calls, one of which is locked and so atomic and the other which is not. The bug shows up if we interrupt the timer dispatch after the actual dispatch to user code is complete but before we've re-entered the locked code in the timer queue to finish the dispatch. Adding some more tests to interrupt the timer dispatch at this point shows the fault.</p>

<p>Of course the unlocked timer dispatch could be interrupted at any point in its execution, luckily for us there's only one point where it would cause us problems and that's the point where we update the value of <code>m_timedout.pTimer</code> from a valid pointer to 0. The window of opportunity for the race condition is from after we've updated the pointer to before we re-acquire the lock. Our new tests therefore test the right thing...</p>

<p>And now on to fixing the problem...</p>

<p>The problem occurs because we're using <code>m_timedout.pTimer</code> for two purposes. One of them, requires that it be updated immediately after the user's timer code has been executed, this is so that we prevent <code>OnTimer()</code> being called multiple times. The second requires that it only be updated whilst the lock is held as we also use the value to determine if the timer is currently being processed. The solution is to add a flag to the timer data which is used to state that the timer is currently being processed. This flag is only ever updated when the lock is held and therefore removes the race condition. The unlocked update of <code>m_timedout.pTimer</code> is now safe as <code>DestroyTimer()</code> uses the new flag to determine if it should delete the timer data or not...</p>

<p>There's still a potential race condition that would allow <code>OnTimer()</code> to be called multiple times if multiple threads were able to call <code>HandleTimeout()</code> concurrently for the same timer but since you'd have to jump through quite a few hoops to make that occur I think we'll leave that for now.</p>

Code is <a href="http://www.lenholgate.com/zips/PracticalTesting-20.zip" onclick="_gaq.push(['_trackEvent', 'Downloads', 'PracticalTesting-20.zip']);">here</a> and the new rules from <a href="http://www.lenholgate.com/archives/000795.html">last time</a> apply. Note that this fix <i><b>is</b></i> included in <a href="http://www.lenholgate.com/archives/000797.html">release 5.2.3</a> of <a href="http://www.serverframework.com/">The&nbsp;Server&nbsp;Framework</a>.]]>
    </content>
</entry>

<entry>
    <title>Practical Testing: 19 - Removing the duplicate code</title>
    <link rel="alternate" type="text/html" href="http://www.lenholgate.com/blog/2008/08/practical-testing-19---removing-the-duplicate-code.html" />
    <id>tag:www.socketframework.com,2008:/blog//12.851</id>

    <published>2008-08-03T20:28:14Z</published>
    <updated>2010-12-29T09:52:11Z</updated>

    <summary>The code in the last two articles in the &quot;Practical Testing&quot; series have contained a considerable amount of duplication. This came about for a couple of reasons. Firstly part 17 was a bit rushed and secondly it was useful to...</summary>
    <author>
        <name>Len</name>
        
    </author>
    
        <category term="Source Code" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Testing" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en-us" xml:base="http://www.lenholgate.com/blog/">
        <![CDATA[<p>The code in the last two articles in the "<a href="http://www.lenholgate.com/archives/000306.html">Practical Testing</a>" series have contained a considerable amount of duplication. This came about for a couple of reasons. Firstly part 17 was a bit rushed and secondly it was useful to compare the <code>CCallbackTimerQueue</code> implementation with the <code>CCallbackTimerQueueEx</code> implementation. I'm also a firm believer that in this kind of situation it's better to get both sets of code working independently and then refactor to remove any duplication rather than attempting to design a duplicate-free solution from the start.</p>

Anyway, this time around we'll remove the duplication by creating a base class that does 99% of the work and then have our two timer queue implementations inherit from it.]]>
        <![CDATA[<p>It's actually easier to think about the differences between <code>CCallbackTimerQueue</code> and <code>CCallbackTimerQueueEx</code> rather than the duplicate code since they share so much code! The main differences are that <code>CCallbackTimerQueueEx</code> uses <code>GetTickCount64()</code> directly whilst <code>CCallbackTimerQueue</code> fabricates a 64-bit tick count from <code>GetTickCount()</code>. By hoisting most of the code from <code>CCallbackTimerQueueEx</code> into our new base class, <code>CCallbackTimerQueueBase</code>, we can have <code>CCallbackTimerQueueEx</code> provide an implementation of the abstract method, <code>GetTickCount64()</code>, which simply returns the tick count. Then we can remove the shared code from <code>CCallbackTimerQueue</code> and have it use the slightly more complex implementation of <code>GetTickCount64()</code> that it requires. </p>

<p>Finally there's the whole concept of the 'maintenance timer' that <code>CCallbackTimerQueue</code> uses. This can also be hoisted into the base class, made slightly more generic and <code>CCallbackTimerQueue</code> can then set and manipulate its maintenance timer by calling the appropriate methods on its base class.</p>

<p>Once again we can be sure that we haven't broken anything because our tests still run. There's still some duplication in the tests and, perhaps, we'll address this next time...</p>

Code is <a href="http://www.lenholgate.com/zips/PracticalTesting-19.zip" onclick="_gaq.push(['_trackEvent', 'Downloads', 'PracticalTesting-19.zip']);">here</a> and the new rules from <a href="http://www.lenholgate.com/archives/000795.html">last time</a> apply.]]>
    </content>
</entry>

<entry>
    <title>Practical Testing: 18 - Removing the potential to deadlock</title>
    <link rel="alternate" type="text/html" href="http://www.lenholgate.com/blog/2008/08/practical-testing-18---removing-the-potential-to-deadlock.html" />
    <id>tag:www.socketframework.com,2008:/blog//12.848</id>

    <published>2008-08-02T11:15:23Z</published>
    <updated>2010-12-29T09:49:05Z</updated>

    <summary>Back in 2004, I wrote a series of articles called &quot;Practical Testing&quot; where I took a piece of complicated multi-threaded code and wrote tests for it. I then rebuild the code from scratch in a test driven development style to...</summary>
    <author>
        <name>Len</name>
        
    </author>
    
        <category term="Source Code" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Testing" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en-us" xml:base="http://www.lenholgate.com/blog/">
        <![CDATA[<p>Back in 2004, I wrote a series of articles called "<a href="http://www.lenholgate.com/archives/000306.html">Practical Testing</a>" where I took a piece of complicated multi-threaded code and wrote tests for it. I then rebuild the code from scratch in a test driven development style to show how writing your tests before your code changes how you design your code. Since then there have been various <a href="http://www.lenholgate.com/archives/000566.html">changes</a> and <a href="http://www.lenholgate.com/archives/000771.html">fixes</a> and <a href="http://www.lenholgate.com/archives/000773.html">redesigns</a> all of which were made considerably easier due to the original tests.</p>

<p>There has always been a potential for deadlock when using the timer queue, which is unfortunate but something that you can avoid if you know the rules. The problem is that if you have a lock in your code which you hold when calling into the timer queue and you also try and acquire your lock when the timer queue calls into you in <code>OnTimer()</code> and you're using the <code>CThreadedCallbackTimerQueue</code> then you will deadlock as the threaded timer queue has its own lock which it acquires when you call into the queue and which it holds when it calls into you via <code>OnTimer()</code>. As I pointed out in <a href="http://www.lenholgate.com/archives/000381.html">Part 12</a>, the threading is orthogonal, so it's only the <code>CThreadedCallbackTimerQueue</code> that has this problem, but this is probably the class you're most likely to use!</p>

So, this time around we'll adjust the <code>CThreadedCallbackTimerQueue</code> so that it doesn't have to hold its lock when it calls back into you and thus we'll remove the potential for deadlock and make the code much easier to use correctly. In addition we'll make it easier to use either the new <code>CCallbackTimerQueueEx</code> or the original <code>CCallbackTimerQueue</code> with  <code>CThreadedCallbackTimerQueue</code>.]]>
        <![CDATA[<p>The first problem that we'll address is being able to use either implementation of the timer queue with the threaded timer queue. Right now the problem is that we're tied to the concrete object of <code>CCallbackTimerQueue</code>. We can break that tie in the usual way by slipping in an interface. I don't want to extend <code>IQueueTimers</code> as that's the interface that clients of the timer queue use and it's not appropriate for a client of the queue to be able to do the things that we want the <code>CThreadedCallbackTimerQueue</code> to be able to do. So, <code>IManagerTimerQueue</code> will extend <code>IQueueTimers</code> and add the support that we need to be able to compose timer queues from other queues. Initially that support is for <code>GetNextTimeout()</code> and <code>HandleTimeouts()</code>. Once we add this interface and adjust the two existing queues to implement it and the threaded queue to work in terms of it it's easy to add some configuration flags to the threaded queue's constructor so that we can select the queue that we want to use when we create the threaded queue.</p>

<p>The lock holding problem is slightly harder to address. The problem exists because of this code in <code>Run()</code> in the threaded timer queue:</p>
<pre class="brush: cpp gutter: false">while (!m_shutdown)
{
   const Milliseconds timeout = GetNextTimeout();
  
   if (timeout == 0)
   {
      CCriticalSection::Owner lock(m_criticalSection);
   
      m_timerQueue.HandleTimeouts();
   }
   else 
   {
      m_stateChangeEvent.Wait(timeout);
   }
}
</pre>
<p>To maintain the thread safety of the data structures inside the queue we lock our critical section when we call <code>HandleTimeouts()</code>. We also lock around all other calls into the queue implementation. Inside the implementation, in <code>HandleTimeouts()</code>, we do this:</p>
<pre class="brush: cpp gutter: false">void CCallbackTimerQueue::HandleTimeouts()
{
   while (0 == GetNextTimeout())
   {
      TimerQueue::iterator it = m_queue.begin();
      
      TimerData *pData = it-&gt;second;
  
      m_queue.erase(it);
  
      Handle handle = reinterpret_cast&lt;handle&gt;(pData);
  
      MarkHandleUnset(handle);
  
      pData-&gt;OnTimer();
  
      if (pData-&gt;IsOneShotTimer())
      {
         DestroyTimer(handle);
      }
   }
}
</pre>
<p>which means that our lock is held when <code>OnTimer()</code> is called.</p>

<p>The first step in solving this problem is to break the timer processing into several functions and separate the pieces that require that they're locked to maintain thread safety from the pieces that do not require locking. Logically we need the queue to expose something like this; <code>BeginTimeoutHandling()</code>, <code>HandleTimeout()</code> and <code>EndTimeoutHandling()</code> where <code>HandleTimeout()</code> doesn't need a lock to be held to maintain thread safety.</p>

<p>Our new interface looks a bit like this:</p>
<pre class="brush: cpp gutter: false">class IManageTimerQueue : public IQueueTimers
{
   public :
  
      virtual Milliseconds GetNextTimeout() = 0;
  
      virtual void HandleTimeouts() = 0;
  
      typedef ULONG_PTR * TimeoutHandle;
  
      static TimeoutHandle InvalidTimeoutHandleValue;
  
      virtual TimeoutHandle BeginTimeoutHandling() = 0;
  
      virtual void HandleTimeout(
         TimeoutHandle &amp;handle) = 0;
  
      virtual void EndTimeoutHandling(
         TimeoutHandle &amp;handle) = 0;
  
      virtual ~IManageTimerQueue() {}
};
</pre>
<p><code>TimeoutHandle</code> is a new type of handle to a timeout which is only used for processing timeouts that have happened.</p>
<p>With the new interface we can write the timeout handling in our threaded queue like this:</p>
<pre class="brush: cpp gutter: false">while (!m_shutdown)
{
   const Milliseconds timeout = GetNextTimeout();
   
   if (timeout == 0)
   {
      IManageTimerQueue::TimeoutHandle handle = IManageTimerQueue::InvalidTimeoutHandleValue;
 
      {
         CCriticalSection::Owner lock(m_criticalSection);
      
         handle = m_pTimerQueue-&gt;BeginTimeoutHandling();
      }
 
      if (handle != IManageTimerQueue::InvalidTimeoutHandleValue)
      {
         m_pTimerQueue-&gt;HandleTimeout(handle);
  
         CCriticalSection::Owner lock(m_criticalSection);
 
         m_pTimerQueue-&gt;EndTimeoutHandling(handle);
      }
   }
   else 
   {
      m_stateChangeEvent.Wait(timeout);
   }
}
</pre>
<p>We hold our lock to obtain a handle to a timeout that needs to be dispatched, unlock, dispatch the timeout and then acquire our lock again to complete the dispatch. Now we just have to make the implementations operate correctly in terms of the new interface.</p>

<p>Note that the queue must be able to operate correctly when any of the timer queue manipulation functions are called whilst a timer is being dispatched. So, for example, if we have called <code>BeginTimeoutHandling()</code> and then call <code>SetTimer()</code> or <code>CancelTimer()</code> for the timer that is currently being processed by <code>BeginTimeoutHandling()</code> then things should work as expected. Luckily, due to the fact that we have separated out the threading from the logic of the queue we can write tests for these situations pretty easily. With the tests in place, and failing due to lack of implementation, we can begin to implement the changes.</p>

<p>At first glance, <code>HandleTimeouts()</code> just needs to be broken into three pieces; <code>BeginTimeoutHandling()</code> needs to do this</p>
<pre class="brush: cpp gutter: false">if (0 == GetNextTimeout())
{
   TimerQueue::iterator it = m_queue.begin();
 
   TimerData *pData = it-&gt;second;
 
   m_queue.erase(it);
 
   Handle handle = reinterpret_cast<handle>(pData);
 
   MarkHandleUnset(handle);</handle>
</pre>
<p><code>HandleTimeout()</code> needs to do this:
<pre class="brush: cpp gutter: false">   pData-&gt;OnTimer();</pre>
and <code>EndTimeoutHandling()</code> needs to do this:</p>
<pre class="brush: cpp gutter: false">   if (pData-&gt;IsOneShotTimer())
   {
      DestroyTimer(handle);
   }
</pre>
<p>Unfortunately things aren't quite that simple... The tests that we wrote earlier for making sure that timer dispatch can be safely interrupted by other calls show that <code>SetTimer()</code> could cause the timer to be updated whilst it is being dispatched and, if the user data or timer callback is changed in the call to <code>SetTimer</code> then the timer that is being dispatched might be the wrong one! Even worse, <code>DeleteTimer()</code> may destroy the timer that we're in the middle of dispatching!</p>

<p>To work around these issues we need to adjust the data that we store for each timer. By separating the data that is used during dispatch from the data that is used when the timer is set and queued we can allow a call to <code>SetTimer()</code> to update a timer and requeue it whilst it's being dispatched. We then need to check to see if a timer is being dispatched when <code>DeleteTimer()</code> is called for it and if so we should simply note that the timer has been deleted and delete it once the dispatch is complete. See the code for the detail.</p>

<p>Since our new <code>CThreadedCallbackTimerQueue</code> works in terms of <code>IManagerTimerQueue</code> we can add a constructor to allow you to plug your own implementation of <code>IManagerTimerQueue</code> in if you like, and once we've done that we can test the <code>CThreadedCallbackTimerQueue</code> more easily because we can give it a mock of <code>IManagerTimerQueue</code>.</p>

<p>The duplication in the code still bothers me, so I expect the next instalment will deal with that!</p>

Code is <a href="http://www.lenholgate.com/zips/PracticalTesting-18.zip" onclick="_gaq.push(['_trackEvent', 'Downloads', 'PracticalTesting-18.zip']);">here</a> and the new <a href="http://www.lenholgate.com/archives/000771.html">rules</a> apply. I've changed the way the test harness operates recently, it now runs all tests and logs the failures and skipped tests, rather than stopping as soon as there's a single failure. By default the project is set to build for Windows Vista; this will mean that the code WILL NOT RUN on earlier operating systems as it will try and use <code>GetTickCount64()</code> which isn't available. To fix this you need to edit the <code>Admin\TargetWindowsVersion.h</code> file and change the values used; see <code>Admin\TargetWindowsVersion_WIN2K.h</code> and <code>Admin\TargetWindowsVersion_WINXP.h</code> for details.]]>
    </content>
</entry>

<entry>
    <title>Practical Testing: 17 - A whole new approach</title>
    <link rel="alternate" type="text/html" href="http://www.lenholgate.com/blog/2008/04/practical-testing-17---a-whole-new-approach.html" />
    <id>tag:www.socketframework.com,2008:/blog//12.826</id>

    <published>2008-04-09T17:54:51Z</published>
    <updated>2010-12-29T09:17:14Z</updated>

    <summary>The comments to my last practical testing entry got me thinking. The commenter who had located the bug in part 15, which was fixed in part 16, suggested a new approach to the problem and I&apos;ve been investigating it. The...</summary>
    <author>
        <name>Len</name>
        
    </author>
    
        <category term="Source Code" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Testing" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en-us" xml:base="http://www.lenholgate.com/blog/">
        <![CDATA[<p>The comments to my <a href="http://www.lenholgate.com/archives/000771.html">last practical testing </a>entry got me thinking. The commenter who had located the bug in <a href="http://www.lenholgate.com/archives/000566.html">part 15</a>, which was fixed in <a href="http://www.lenholgate.com/archives/000771.html">part 16</a>, suggested a new approach to the problem and I've been investigating it.</p>

<p>The suggestion is, essentially, to use a timer with a longer range before roll-over rather than <code><a href="http://msdn2.microsoft.com/en-us/library/ms724408(VS.85).aspx">GetTickCount()</a></code> with its <a href="http://en.wikipedia.org/wiki/GetTickCount">49.7 day roll-over</a>. In Vista and later we could just use <code><a href="http://msdn2.microsoft.com/en-us/library/ms724411(VS.85).aspx">GetTickCount64()</a></code> but on earlier platforms that's not available to us. My commenter's solution was to build a <code>GetTickCount64()</code> on top of <code>GetTickCount()</code> and use that. Given that adjusting the code for Vista support via the real <code>GetTickCount64()</code> was on my list of things to do, I decided to also take a look at the potential of the hybrid approach suggested by my commenter.</p>

<p>Switching to using a greater range means that we can remove much of the complexity which was there to protect us from the rollover as this will now only occur after around 584942417.4 years of machine up-time rather than after 49.7 days... </p>

<p>In the zip file that accompanies this article there are two timer queues under test. The first, <code>CCallbackTimerQueue</code> uses a hybrid <code>GetTickCount64()</code> implementation that will work on any platform as it uses <code>GetTickCount()</code> to do the work and the timer queue manages the upper 32-bits itself. The second, <code>CCallbackTimerQueueEx</code>, uses the real <code>GetTickCount64()</code> call and will only run on Windows Vista or later platforms. You can build for pre-Vista systems by editing the <code>Admin\TargetWindowsVersion.h</code> header file and adjusting the values for <code>NTDDI_VERSION</code> and <code>_WIN32_WINNT</code>.</p>

<p>The native Vista version of the code is the simplest so I'll discuss that first. There are several additional issues that need to be dealt with if we are building our own <code>GetTickCount64()</code> and these get in the way of the simpler code...</p>

<p>The first thing, of course, is that I had the tests that were written for the previous versions of the code to make it easier for me to make these changes to the internals of the code. I did this before in <a href="http://www.lenholgate.com/archives/000566.html">part 15</a> when I fiddled around with the internals to make the code more scalable. The presence of the tests makes this kind of change quite fun; I can concentrate on hacking away at the old design and know that if I change some functionality that is covered by my tests then I should find out as soon as I run the tests. Looking at the header for <code>CCallbackTimerQueueEx</code> the first thing that you'll notice is that I've removed a couple of constructors; there's now no need to allow the user to tune the maximum timeout allowed. Next you'll see that the actual data structures used for the queue have been simplified; we only need one queue now rather than two and the timers are keyed by <code>ULONGLONG</code> rather than <code>Millisecond</code> (<code>DWORD</code>). There are less helper functions and we use an instance of <code>IProvideTickCount64</code> rather than <code>IProvideTickCount</code>. Looking at the code itself, I've hardcoded the maximum timeout to one less than <code>INFINITE</code> which gives us the whole usable range of a <code>DWORD</code> for timeouts. I don't see any advantage in expanding the length of the timeouts that you can set to be <code>ULONGLONG</code>s as 49.7 days should be long enough for anyone ;) and, if it isn't, the user can set another timer when that one expires and build a longer timeout using the current implementation. Since all of the multiple queue stuff can go, setting timers is now simpler and we can go back to the functionality from <a href="http://www.lenholgate.com/archives/000566.html">part 15</a> where calling <code>SetTimer()</code> does <b><i>NOT</i></b> cause timed out timers to be handled automatically (I was never really comfortable with that change anyway!). <code>InsertTimer()</code> is simpler as we're only ever dealing with a single timer queue and rather than all the complexity that we had before for dealing with a timer that spans a rollover we can now simply disallow timers that do that; I don't feel too bad about doing this as I think it's reasonable to specify that the code doesn't support setting timers that cross a 584942417.4 years rollover point. <code>GetNextTimeout()</code> is now massively simplified as all it needs to do is look at the timeout value and compare it with now to see if it has expired. And that's it.</p>

<p><code>CCallbackTimerQueue</code> is more complex, but not massively so. The complexity arises due to how I maintain the high 32-bits of the 64-bit counter. Since the code works in terms of the 32-bit counter value returned by <code>GetTickCount()</code> and we know that this wraps every 49.7 days I figure we can spot the wrap (now is less than the last time we checked) and use the event to increment the high 32-bit counter. The only potential risk is that we don't spot the wrap, that is, we don't call <code>GetTickCount()</code> for 49.7 days and the counter wraps and then becomes more than the last time we called <code>GetTickCount()</code>. To prevent this unlikely situation, the timer queue sets its own internal maintenance timer for the 32-bit counter roll over point. All this timer does is go off reset itself, but, I think, this is enough to cause <code>GetTickCount()</code> to be called often enough to prevent any problems... </p>

<p>The tests need to change a little due to the way that <code>SetTimer()</code> no longer implies <code>HandleTimeouts()</code> and because of the internal maintenance timer that is set upon construction. </p>

<p>The duplication in the code bothers me, so I expect the next instalment will deal with that, and any bugs that people report!</p>

<p>Code is <a href="http://www.lenholgate.com/zips/PracticalTesting-17.zip" onclick="_gaq.push(['_trackEvent', 'Downloads', 'PracticalTesting-17.zip']);">here</a> and the new <a href="http://www.lenholgate.com/archives/000771.html">rules</a> apply.</p>

<i><b>Note:</b></i> This release has been rushed, I haven't had a chance to check any of the builds except the VS 2008 and VS 2005 ones. I'll check the rest and fix any problems when I get back from <a href="http://www.zermatt.ch/index.e.html">Zermatt</a>.]]>
        
    </content>
</entry>

<entry>
    <title>Practical Testing: 16 - Fixing a timeout bug</title>
    <link rel="alternate" type="text/html" href="http://www.lenholgate.com/blog/2008/04/practical-testing-16---fixing-a-timeout-bug.html" />
    <id>tag:www.socketframework.com,2008:/blog//12.824</id>

    <published>2008-04-04T09:28:39Z</published>
    <updated>2010-12-28T15:29:10Z</updated>

    <summary>Back in 2004, I wrote a series of articles called &quot;Practical Testing&quot; where I took a piece of complicated multi-threaded code and wrote tests for it. I then rebuild the code from scratch in a test driven development style to...</summary>
    <author>
        <name>Len</name>
        
    </author>
    
        <category term="Source Code" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Testing" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en-us" xml:base="http://www.lenholgate.com/blog/">
        <![CDATA[<p>Back in 2004, I wrote a series of articles called "<a href="http://www.lenholgate.com/archives/000306.html">Practical Testing</a>" where I took a piece of complicated multi-threaded code and wrote tests for it. I then rebuild the code from scratch in a test driven development style to show how writing your tests before your code changes how you design your code. Then, in 2005, I <a href="http://www.lenholgate.com/archives/000566.html">adjusted the code</a> to be more scalable and I showed how the tests that had originally been written helped when code needed to be changed for performance purposes. Finally I uploaded a test utility program that I'd been working on, <a href="http://www.lenholgate.com/archives/000647.html">TickShifter</a>, that allowed you to run a program and take control of how the <code>GetTickCount()</code> API operated within the program. The idea was that you could control time from outside the program to enable you to easily test edge conditions.</p>

<p>Time passed...</p>

<p>Recently I've had a bug reported against the timer queue code that was developed in the testing articles. You can find the bug report comment <a href="http://www.lenholgate.com/archives/000337.html">here</a>. I'd like to thank the commenter for taking the time to report the bug in such a thorough manner; it made it much easier for me to validate the problem and craft a test that proved its existence.</p>

<p>The bug is as follows: If the tick count has wrapped but no timers have fired since it wrapped and you then add a new timer the two queues that are used to manage wrapped timers have not been adjusted to allow for the fact that the tick count has wrapped. This means that the new timer is added to the wrong queue and the queues are then not swapped and timers expire in the wrong order.</p>

<p>It's a nice edge case bug and it's one that was not tested for in the original test harness. The first thing that I did was set out to write a test to reproduce the bug. The commenter had used <a href="http://www.lenholgate.com/archives/000647.html">TickShifter</a> to force the situation; which is what it's for! but for development and regression purposes it's better to have tests in the test harness that makes sure the bug stays fixed.</p>

<p>The first new test <code>TestTickCountWrap2()</code> sets up the environment exactly as the bug report stated. Take a look at the code for full details. First we set the tick count to be 1000ms before roll-over. Next we set a timer for 2000ms time, i.e. 1000ms after the tick count rolls over to 0. Next we set the tick count to 0, i.e. the point at which it rolls over. We then set another timer for 10000ms time. At this point we should have two timers set; the first will expire in 1000ms and the second in 10000ms. This is the point where the original code had a bug. Next we set the tick count to 1000. At this time the first timer should go off and we check that it does. We also verify that there are 9000ms until the next timeout. Finally we set the tick count to 10000 and verify that the second timer expires correctly.</p>

<p>Of course, when run with the code from <a href="http://www.lenholgate.com/archives/000566.html">part 15</a> this test fails for exactly the reason that the bug report stated. The problem was that we used two queues for the timers, one for timers before the "wrap point" and one for timers after. The only point that we ever switched the queues over was when a timer expired. At that point if we knew we had a timer that had expired and the current queue didn't contain any timers then we knew that the tick count had wrapped and that the timer we wanted was in the other queue. Of course, looking at this now, written down like that, it's quite obvious to see the flaws. It doesn't matter how many tests you write, if you don't write the correct tests then your code can still have bugs!</p>

<p>A solution to this bug is to keep a track of the tick count when we set a 'wrapped' timer. Then, when checking for timeouts, if the current tick count is less than the tick count when we last set a wrapped time we know that the count has wrapped and we can adjust the queues accordingly. The problem then is that we need to make sure the queues are correct when we set new timers as well as when existing timers expire. The simplest fix seems to be to cause a call to <code>SetTimer()</code> to first check for the expiry of existing timers. This means that the code that is used to check for a wrap when a timer expires is also executed before adding new timers. Hopefully, this means that the queues will always remain correct.</p>

<p>The second new test expands on the first to ensure that a timer that has expired when <code>SetTimer()</code> is called is correctly processed.</p>

Code is <a href="http://www.lenholgate.com/zips/PracticalTesting-16.zip" onclick="_gaq.push(['_trackEvent', 'Downloads', 'PracticalTesting-16.zip']);">here</a> but new <a href="http://www.lenholgate.com/archives/000412.html">rules</a> apply: The code will build with VC6, VS.Net 2002, VS.Net 2003, VS 2005 and VS 2008. The code builds as x86 or x64 with VS 2005 and 2008. The code will build with either the standard STL that comes with Visual Studio or with a version of STL Port. The code uses precompiled headers the right way so that you can build with precompiled headers for speed or build without them to ensure minimal code coupling. The code can also compile on VC6 with or without the platform SDK being installed. The various options are all controlled from the "Admin" project; edit Admin.h to change things...]]>
        
    </content>
</entry>

</feed>



