<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <title>Rambling Comments</title>
    <link rel="alternate" type="text/html" href="http://www.lenholgate.com/blog/" />
    <link rel="self" type="application/atom+xml" href="http://www.lenholgate.com/blog/atom.xml" />
    <id>tag:www.lenholgate.com,2010-12-10:/blog//12</id>
    <updated>2011-01-02T13:39:24Z</updated>
    
    <generator uri="http://www.sixapart.com/movabletype/">Movable Type Pro 5.12</generator>

<entry>
    <title>Practical Testing: 23 - Another new approach: timer wheels</title>
    <link rel="alternate" type="text/html" href="http://www.lenholgate.com/blog/2010/07/practical-testing-23---another-new-approach-timer-wheels.html" />
    <id>tag:www.socketframework.com,2010:/blog//12.961</id>

    <published>2010-07-19T08:01:18Z</published>
    <updated>2011-01-02T13:39:24Z</updated>

    <summary>The most recent articles in the &quot;Practical Testing&quot; series have been discussing the performance of the timer queue that we have built. As I hinted when I first brought up the performance issues, the particular use case that I have...</summary>
    <author>
        <name>Len</name>
        
    </author>
    
        <category term="ENet" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Source Code" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Testing" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en-us" xml:base="http://www.lenholgate.com/blog/">
        <![CDATA[<p>The most recent articles in the <a href="http://www.lenholgate.com/archives/000306.html">"Practical Testing"</a> series have been discussing the performance of the timer queue that we have built. As I hinted when I <a href="http://www.lenholgate.com/archives/000906.html">first brought up</a> the performance issues, the particular use case that I have which is causing problems might well be more efficiently dealt with using a different (more specialised and less general purpose) design. </p>

<p>The timer queue has adequate performance for general purpose use and can handle timers set within a range of 0ms to 49.7 days with a theoretical granularity of 1ms. It achieves this by using a balanced search tree to store the timers by absolute timeout. The performance of setting a timer is O(log n) due to the tree insertion required. Cancelling a timer is O(1) since we keep an iterator to where the timer was inserted and thus can navigate straight to it to cancel it. Timer expiry is also an O(log n) operation due to the tree lookup. Due to the use of the standard program heap the worst case contention of the queue is <b>C(tq)+(C(tn-tq+1)+C(ts-tq+1)+C(tn-tq+1))</b> (see <a href="http://www.lenholgate.com/archives/000908.html">here</a> for details of my crazy Big C notation for describing contention).</p>

The more specialist use case is for driving reliable UDP protocols. This kind of work generally requires timers per connection for retransmission and data flow pacing. The timeouts tend to be short and the timers tend to expire rather than be reset without expiring. The range of timeouts is generally quite small; 0ms - 30seconds for the ENet system I'm working on. I'm currently looking at improving performance of the timer system for this kind of scenario and to do so requires that timer insertion speed be improved (so we can set timers more quickly), timer expiry speed be improved (so we can process timers faster) and contention be reduced, ideally tending towards <b>C(tq)</b> where we have contention only between users of the timer queue and not between any thread in the process.]]>
        <![CDATA[<p>As I have already mentioned the use of STL containers means that I'm doing more work than is strictly necessary when manipulating the timer queue (including dynamic memory allocation and release during timer insertion and removal). One way of improving contention is to switch to using custom STL allocators so that only the users of the queue ever access the allocators that we use for the queue. Another is to write a custom, invasive, balanced search tree that does not need to use dynamic allocation.</p>

<p>A third solution would be to use a simpler data structure. Our requirement is simply to store timers in order of timeout. Rather than using a complex tree structure we could use a simple sorted list. Unfortunately timer insertion would then rise to O(n) as we would need to traverse the list to locate the correct spot to insert our new timer. Cancellation can stay O(1) if we use our invasive <code>CNodeList</code> and timer handling becomes O(1) because we will always work from the head of the list when expiring timers. The usage pattern of the reliable retransmission means that we'll be inserting timers over the whole of our possible range, so the O(n) insertion would really bite us. </p>

<p>In a classic trade off between memory usage and performance we could use an array and have lots of wasted space in it. Setting a timer becomes O(1), you simply index directly into the array at the correct location. Cancellation and timer processing are also O(1) and there's no dynamic memory allocation required for insertion and removal so the worst case contention is C(tq). Such a structure is called a timer wheel due to the fact that the array is viewed as a circular buffer and timers are inserted with timeouts relative to a 'now' point on the wheel. </p>

<p>The amount of memory used can be reduced by reducing the granularity at which you can set your timers. For example, a timer wheel with a range of 0-30seconds and a granularity of 1ms requires 30,000 elements in the array, if you reduce the granularity to 15ms (which is pretty much the best you can get from <code>GetTickCount()</code> anyway), then the array size is reduced to a more manageable 2,000 elements. Given that the array is an array of pointers we're looking at 8kB on an x86 and 16kB on x64. Each array element points to either <code>null</code> if no timer is set or to the first timer in a doubly linked list of timers at this time. The list is invasive with the links being part of the data that is stored in the list. Insertion into the list is a case of simply pushing a new node onto the front of the existing list, cancellation is easy as the list is doubly linked and the node contains the links. Thus most timer manipulation becomes simply adjusting pointers.</p>

<p><img alt="TimerWheel-1.png" src="http://www.lenholgate.com/blog/images/TimerWheel-1.png" width="434" height="301" border="0" /></p>

<p>The wheel in the diagram above has a granularity of 5ms and has timers set at 30 and 50. The wheel is defined by two pointers, one to the start of it and one to one element beyond the end.</p>

<p><img alt="TimerWheel-2.png" src="http://www.lenholgate.com/blog/images/TimerWheel-2.png" width="434" height="301" border="0" /></p>

<p>This diagram clearly shows the circular nature of the array. This is just before we expire the 30ms timer. Note that the next timer is due in 20ms.</p>

<p>My implementation of a timer wheel is made easier by the fact that I have a set of tests that target the interface to which I wish to conform to. To start with I'll implement a basic timer wheel that allows us to create, set and cancel timers but that doesn't deal with any of the complexity of expiring timers. Also all of the nice and implied or explicit implementation details will be left out. Don't worry, once we write the tests for these pieces of functionality it'll be obvious where we're failing.</p>

<p>Creation and destruction of the timer wheel are pretty straight forward. We have an array of pointers to create, the size of which is based on the maximum timeout that we can set and the granularity of the timers that can be set. Destruction is similar to the timer queue in that we iterate any existing timers and clean them up. Timer creation is very similar to our timer queue as we dynamically allocate the timer data and insert it into a map for validation and clean up purposes. The timers themselves are, at present at least, quite simple. a link for the next timer in the list, a link to the previous timer and the timer and user data. Setting a timer simply involves validating it, locating the correct index into the timer wheel array and then adding the timer to the list of timers at that point in the array.</p>
<pre class="brush: cpp gutter: false">bool CCallbackTimerWheel::SetTimer(
   const Handle &amp;handle,
   Timer &amp;timer,
   const Milliseconds timeout,
   const UserData userData)
{
   if (timeout &gt; m_maximumTimeout)
   {
      throw CException(
         _T("CCallbackTimerWheel::SetTimer()"), 
         _T("Timeout is too long. Max is: ") + ToString(m_maximumTimeout) + _T(" tried to set: ") + ToString(timeout));
   }
  
   TimerData &amp;data = ValidateHandle(handle);
  
   const bool wasSet = data.CancelTimer();
  
   data.UpdateData(timer, userData);
  
   InsertTimer(timeout, data);
  
   return wasSet;
}
  
void CCallbackTimerWheel::InsertTimer(
   const Milliseconds timeout,
   TimerData &amp;data)
{
   const size_t timerOffset = timeout / m_timerGranularity;
  
   TimerData **ppTimer = GetTimerAtOffset(timerOffset);
  
   data.SetTimer(ppTimer, *ppTimer);
}
  
void CCallbackTimerWheel::TimerData::SetTimer(
   TimerData **ppPrevious,
   TimerData *pNext)
{
   if (m_ppPrevious)
   {
      throw CException(
         _T("CCallbackTimerWheel::TimerData::SetTimer()"),
         _T("Internal Error: Timer is already set"));
   }
  
   m_ppPrevious = ppPrevious;
  
   m_pNext = pNext;
  
   if (m_pNext)
   {
      m_pNext-&gt;m_ppPrevious = &amp;m_pNext;
   }
  
   *ppPrevious = this;
}
</pre>

<p>I'm using a pointer to the previous pointer rather than a pointer to the previous node as it makes things slightly simpler; honest...</p>

<p>With just enough code to get the first set of tests to run I have enough to get some initial performance figures out of the new timer system. Timer creation is about the same as with the queue, but that's expected as the code is almost identical; the contention for creation and destruction are also the same as for the queue and thus could also be improved with custom allocators and private heaps. The performance tests for <code>SetTimer()</code> show a dramatic improvement. On my test machine I get figures of around 4ms to set a single timer 100,000 times against 90ms for the queue and similar improvements in the other two performance tests for <code>SetTimer()</code>. What's even better is that <code>SetTimer()</code> would have a contention of <b>C(t-queue)</b> as we no longer have to do any of the dynamic allocation that was going on with the timer queue's STL manipulation.</p>

<p>Right now we're left with a failing test which points the way for what we need to do next which is deal with being able to process these timers when they time out, but before I look at that I think it's about time that I take a good hard look at the duplication in the tests. We're testing an interface with three implementations and we should have a single set of tests which does that and then have some implementation specific tests as well if we feel we need them. Having one set of duplicate test code for the Ex version of the queue was wrong but I could just about live with it, having another duplicate set for the timer wheel is just something I'm not prepared to put up with unless it's simply not possible to remove the duplication. </p>

The code can be found <a href="http://www.lenholgate.com/zips/PracticalTesting-23.zip" onclick="_gaq.push(['_trackEvent', 'Downloads', 'PracticalTesting-23.zip']);">here</a> and the <a href="http://www.lenholgate.com/archives/000906.html">previous rules</a> apply.]]>
    </content>
</entry>

<entry>
    <title>Practical Testing: 21 - Looking at Performance and finding a leak</title>
    <link rel="alternate" type="text/html" href="http://www.lenholgate.com/blog/2010/07/practical-testing-21---looking-at-performance-and-finding-a-leak.html" />
    <id>tag:www.socketframework.com,2010:/blog//12.958</id>

    <published>2010-07-15T10:43:37Z</published>
    <updated>2011-01-02T13:36:56Z</updated>

    <summary>Back in 2004, I wrote a series of articles called &quot;Practical Testing&quot; where I took a piece of complicated multi-threaded code and wrote tests for it. I then rebuild the code from scratch in a test driven development style to...</summary>
    <author>
        <name>Len</name>
        
    </author>
    
        <category term="ENet" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Source Code" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Testing" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en-us" xml:base="http://www.lenholgate.com/blog/">
        <![CDATA[<p>Back in 2004, I wrote a series of articles called <a href="http://www.lenholgate.com/archives/000306.html">"Practical Testing"</a> where I took a piece of complicated multi-threaded code and wrote tests for it. I then rebuild the code from scratch in a test driven development style to show how writing your tests before your code changes how you design your code. Since the original articles there have been several bug fixes and redesigns all of which have been supported by the original unit tests and many of which have led to the development of more tests. </p>

<p>The tests were written for the <a href="http://www.lenholgate.com/archives/000843.html">pretty scalable timer queue implementation</a> in my <a href="http://www.serverframework.com/ServerFramework/latest/Docs//sockettoolslicensing.html">high performance, scalable, server framework</a>. The timer queue has been present in the framework ever since I first had to associate timeouts with overlapped I/O requests back on NT 4.0. Back then the <a href="http://msdn.microsoft.com/en-us/library/ms682483(VS.85).aspx">Windows Timer Queue</a> didn't exist and so I rolled my own. As you'll see from the previous entries in this series, the class has gone through some changes over time. The queue works well for dealing with all manner of varied timeouts and is pretty general purpose in nature; I use it for setting timeouts to roll log files to a new name in my rotating log file class, I use it for all manner of per connection based time situations and it scales well; there's no problem having 30,000 connections all with a 2 minute inactivity timeout set, etc.</p>

<p>The problem with it is that it could perform better. The general purpose nature of the queue means that it needs to be flexible; it can handle timeouts from 1ms to 49.7 days in a reasonably time and space efficient manner. I use STL containers to index the timers and that works pretty well. As a general purpose solution it's pretty good. The problem is that it's not good enough, at least it's not fast enough for situations where I'm using the timers to implement retransmission timers for reliable UDP protocols. In these situations I have lots of timers (one per connection, and the aim is to support lots of concurrent connections) for generally short periods of time, 50ms to 30seconds. Since the timers are used for retransmission and data flow pacing they tend to expire (rather than being reset and rarely expiring as is the case with inactivity timers), they also tend to set a new timer when the current one expires. I've found that the general purpose nature of the timer queue and the use of STL means that there's a lot of contention going on for the timer system and that contention affects performance.</p>

<p>The <a href="http://www.lenholgate.com/archives/000381.html">thread safe version</a> of the timer queue protects the internal data structures with a lock and this lock needs to be acquired to do anything to the queue. So we lock when we create a new timer, we lock when we set a timer, we lock when the timer thread is processing an expired timer, etc. Obviously we try and hold this lock for as short a time possible and obviously we try and access the queue from as few threads as possible but eventually the contention starts to bite as connections reset their timers and timers expire almost constantly to drive rate limited flow queues and retransmission, etc. Although in this particular scenario the timer expiry code is quick (we lock, remove the expired timer and the callback simply pushes the user data into a queue of work items that are processed by another thread, we then release the lock) the code still causes the lock to be acquired and released for each timer that expires. In the reliable UDP scenario we have lots of timers set for exactly the same time and so the thread that processes the expired timers spends a lot of time acquiring and releasing the lock on the queue. Since the expiry of a timer often causes a new timer to be set we then have other threads processing the work items and setting new timers which also requires that we lock the queue. So all threads that access the queue are in contention for the lock; there's not much that we can do about that apart from reducing the amount of time spent holding the lock. </p>

<p>Unfortunately it's worse than that. Since we're using the STL to implement our timer queue and since setting, expiring and cancelling a timer all result in adjustments being made to our central STL map object the threads that use our timer queue are also in contention with all other threads in the system that use the same memory heap as our timer queue uses. Each operation results in dynamic memory allocation and/or release.  </p>

My super secret game company client need their <a href="http://enet.bespin.org/">ENet</a> implementation to run fast and to support 1000s of concurrent connections. In profiling their system under load the timer queue is one of the hot spots and the locks that it uses are showing some of the most contention in the system. Because of this I need to take a look at what we're doing and potentially move from a general purpose solution to something a little more specific.]]>
        <![CDATA[<p>My first thoughts were to add some explicit monitoring to the timer queue, <a href="http://www.lenholgate.com/archives/000903.html">performance counters</a> are your friends, and so a monitoring interface was born and the threaded callback timer queue was adjusted so that it could give an idea of the contention for its lock by using <a href="http://msdn.microsoft.com/en-us/library/ms686857(VS.85).aspx">TryEnterCriticalSection()</a> so that we could track contention (and follow such failures to acquire with a plain ol' <a href="http://msdn.microsoft.com/en-us/library/ms682608(v=VS.85).aspx">EnterCriticalSection()</a>. The code looks something like this and whilst the code changes will, no doubt, change the way the contention bites it gives some indication of the contention being experienced.</p>
<pre class="brush: cpp gutter: false">void CThreadedCallbackTimerQueue::SetTimer(
   Timer &amp;timer,
   const Milliseconds timeout,
   const UserData userData)
{
#if (JETBYTE_PERF_TIMER_QUEUE_MONITORING_DISABLED == 0)
  
   ICriticalSection::PotentialOwner lock(m_criticalSection);
  
   if (!lock.TryEnter())
   {
      m_monitor.OnTimerProcessingContention(IMonitorThreadedCallbackTimerQueue::SetOneOffTimerContention);
  
      lock.Enter();
   }
  
#else 
  
   ICriticalSection::Owner lock(m_criticalSection);
  
#endif
  
   m_spTimerQueue-&gt;SetTimer(timer, timeout, userData);
  
   SignalStateChange();
}
</pre>
<p>The next step was to add some performance tests to the timer queue's test suite. Simple things such as measuring the time taken to set timers, cancel them, create them and expire them; obviously we repeat the operation a large number of times and repeat the whole test several times and take the average result. This gives us some figures for the current implementation, the performance tests for general performance improvement in the algorithm and the contention figures for some indication of whether the performance improvements are actually helping in the real-world usage scenario.</p>

<p>Whilst writing the monitoring test I decided that there was a leak in the timer queue for "one shot" timers that were active when the timer queue was destroyed. I added a new monitoring function to track when the internal timer data was actually deleted as this isn't quite the same as when <code>DestroyTimer()</code> is called and anyway <code>DestroyTimer()</code> is never called for "one shot" timers. The resulting trace from the mock monitor proved that the leak that I had thought I had found didn't exist. Since the monitor, with the new deletion monitoring functionality could prove that all timer data was cleaned up correctly I decided to add the monitor to all tests and added a simple call to check that the number of calls to <code>OnTimerCreated()</code> equalled the number of calls to <code>OnTimerDeleted()</code>. This addition to the tests located a memory leak in the code that dealt with destroying a timer handle during timeout processing. I was calling <code>DeleteAfterTimeout()</code> in <code>CCallbackTimerQueueBase::DestroyTimer()</code> rather than calling <code>SetDeleteAfterTimeout()</code>. What's interesting to me is that this leak wasn't located by any of the tests even when running the tests under a leak checking tool such as <a href="http://en.wikipedia.org/wiki/BoundsChecker">BoundsChecker</a>. The reason that BoundsChecker failed to report on it is that it couldn't handle the concept of casting dynamically allocated memory to an opaque handle and storing the handle in a map for later clean up; I guess I can't blame it really... Since it would always complain that the data that is allocated in <code>CCallbackTimerQueueBase::CreateTimer()</code> was leaked even though it was cleaned up correctly later on I added an entry  to the BoundsChecker suppression file; although most of these complaints are invalid, this one WAS valid and the data WAS being leaked. Once again I'm reminded that it doesn't matter how many tests you have, how much coverage you have and even what tools you use the weak link is <i>always</i> the human being in the process... I've changed how the handle map works so that it stores the typed data rather than the opaque handle. This means that BoundsChecker CAN follow the ownership correctly and can now report on the real leak and stay quiet about the things that it thought we leaks but that weren't. I then fixed the actual bug... In summary, don't suppress the warning, understand it and change the code to fix it.</p>

The code is <a href="http://www.lenholgate.com/zips/PracticalTesting-21.zip" onclick="_gaq.push(['_trackEvent', 'Downloads', 'PracticalTesting-21.zip']);">here</a> and new rules apply. A fair while has passed since the <a href="http://www.lenholgate.com/archives/000803.html">previous episode</a> in this series of articles. My build environment, and some of the support code has changed a fair bit since then. The code will build with VS.Net 2002, VS.Net 2003, VS 2005, VS 2008 and VS 2010. The code builds as x86 or x64 with VS 2005, 2008 and 2010. Win32Tools is the workspace that you want and Win32ToolsTest is the project that you should set as active. The code will build with either the standard STL that comes with Visual Studio or with a version of STL Port. The code uses precompiled headers <a href="http://www.lenholgate.com/archives/000345.html">the right way</a> so that you can build with precompiled headers for speed or build without them to ensure minimal code coupling. The various options are all controlled from the "Admin" project; edit <code>Config.h</code> and <code>TargetWindowsVersion.h</code> to change things... By default the project is set to build for Windows 7; this will mean that the code WILL NOT RUN on operating systems earlier than Windows Vista as it will try and use <code>GetTickCount64()</code> which isn't available. To fix this you need to edit the <code>Admin\TargetWindowsVersion.h</code> file and change the values used; see <code>Admin\TargetWindowsVersion_WIN2K.h</code> and <code>Admin\TargetWindowsVersion_WINXP.h</code> for details. Since I'm looking at the performance of the code I've adjusted the VS2008 solutions to build without checked iterators in release mode, by default the STL in VS2008 builds with checked iterators enabled in release mode as well as in debug mode and this adversely affects performance; see <a href="http://msdn.microsoft.com/en-us/library/aa985965.aspx">here</a> for more details. Although the code builds with earlier versions of Visual Studio I'm not actively using these versions and so there may be standard VS STL related performance tweaks that could be applied. I'm only focussing on VS2008 and VS2010 with my testing. This code is in the Public Domain.]]>
    </content>
</entry>

<entry>
    <title>Bugs</title>
    <link rel="alternate" type="text/html" href="http://www.lenholgate.com/blog/2009/11/bugs.html" />
    <id>tag:www.socketframework.com,2009:/blog//12.932</id>

    <published>2009-11-05T19:29:20Z</published>
    <updated>2011-01-02T12:02:08Z</updated>

    <summary>It&apos;s been a bit of a week for bugs. First I had a bug report for the really rather old CodeProject socket server COM object. This allows people using VB to create a socket server that uses the free version...</summary>
    <author>
        <name>Len</name>
        
    </author>
    
        <category term="ENet" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Socket Servers" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en-us" xml:base="http://www.lenholgate.com/blog/">
        <![CDATA[<p>It's been a bit of a week for bugs. First I had a bug report for the really rather old CodeProject socket server COM object. This allows people using VB to create a socket server that uses <a href="http://www.serverframework.com/products---the-free-framework.html">the free version</a> of <a href="http://www.serverframework.com/">The&nbsp;Server&nbsp;Framework</a>. It works well, for what it is, and has been the basis for several custom objects that various clients have needed over the years. The bug involves the 'read string' functionality and either the ATL headers (specifically the narrow to wide character conversion macros) have changed since the code was originally written in 2002 in VC 6.0 or it never worked. Anyway I've posted a fix, <a href="http://www.codeproject.com/KB/IP/comsocketserver.aspx?msg=3260120#xx3260120xx">here</a> and there's a compiled version of the COM component available <a href=" http://www.lenholgate.com/zips/COMSocketServer.dll" onclick="_gaq.push(['_trackEvent', 'Downloads', 'COMSocketServer.dll']);">here</a>.</p>

<p>Next up was a strange (and currently unsolved) problem with my <a href="http://www.serverframework.com/ServerFramework/latest/Docs/class_jet_byte_tools_1_1_i_o_1_1_c_rotating_async_file_log.html">rotating async file log</a>. This is a log that uses async file writes to decouple the writing of log messages to disk from the thread that's logging the messages. One of its features is that it can be set up to change the log file every X period (hour, day, week, etc). It can also be told about time changes on the box, so that it doesn't get confused when daylight savings time changes, or when someone adjusts a clock. Anyway, whilst looking for an issue in a client's log file I noticed that the log file name ended in -16010101.log... This was a little unusual as the log file should have been for 20091103.log (or thereabouts). It looks like the file name is being created with an invalid system time, or one that's set to zero. I spent a few hours writing some tests to exercise the areas of code that I expected might cause this problem but couldn't get a reliable reproduction of it. If anyone else sees this happen please let me know. (The 'new file' creation code is somewhat over-engineered as the log uses a timer to create a new file 30 seconds or so before it needs it and then swaps the file in use for the new file when the first log message is written after the time when the new file should become active).</p>

<p>Finally I'm looking into various implementations of <a href="http://stackoverflow.com/questions/107668/what-do-you-use-when-you-need-reliable-udp">'reliable' UDP</a> for inclusion in <a href="http://www.serverframework.com/">The&nbsp;Server&nbsp;Framework</a>; lots of people ask me about it (mostly games people) and although I've built an async version of the <a href="http://enet.bespin.org/">ENet protocol</a> and integrated it with the framework it was one of the very few pieces of development where the client didn't want to trade cost for IP rights and so I can't reuse for other clients and the couple of clients who were willing to pay for an implementation from scratch have gone quiet on me. Anyway, so I'm looking at various reliable UDP alternatives such as <a href="http://udt.sourceforge.net/">UDT</a>, <a href="http://en.wikipedia.org/wiki/Reliable_User_Datagram_Protocol">RUDP</a>, and on a related note <a href="http://www.rfc-editor.org/rfc/rfc4340.txt">DCCP</a>, and possibly even something a simple as the <a href="http://trac.bookofhook.com/bookofhook/trac.cgi/wiki/Quake3Networking">"Carmack unreliable delta transfer" system</a>. I decide that it might be a good idea to first extend the standard UDP server example and its test client so that I can have a base line of unreliable data flow to measure the improvements or potential improvements on and I discover that the EchoServerUDPTest program has been broken for several releases without me knowing. Basically it was never ported to the <a href="http://www.lenholgate.com/archives/000692.html">'remove inappropriate use of pointers' release</a> (which, I think, was pre 5.2). Anyway, it just goes to show what happens when code isn't automatically tested as part of the build and release process. All along the UDP side of things hasn't been automatically tested as well as the TCP side of things simply because of the unreliable nature of UDP; it's harder to run even a simple black box echo test during a build if some of the packets might not get echoed... </p>

<p><strong>Update:</strong> the game's client that I did the original ENet work for decided to sell me the IP rights to the asynchronous, IOCP based ENet implementation that I had done for them. I'm currently rewriting this from the ground up for even more performance and this means that there WILL be a version available as an optional extra for <a href="http://www.serverframework.com/">The&nbsp;Server&nbsp;Framework</a> in early 2011.</p> 

So, I guess this week's theme is 'must try harder'.]]>
        
    </content>
</entry>

<entry>
    <title>.Net 4.0 Hosting</title>
    <link rel="alternate" type="text/html" href="http://www.lenholgate.com/blog/2009/10/net-40-hosting.html" />
    <id>tag:www.socketframework.com,2009:/blog//12.930</id>

    <published>2009-10-23T10:38:07Z</published>
    <updated>2011-06-23T10:52:38Z</updated>

    <summary>I&apos;ve been playing with Visual Studio 2010 Beta 2 and .Net 4.0, building code, running tests, playing with the IDE, etc. The first issue that I&apos;ve come across with my existing codebase is that the .Net 2.0 hosting APIs (such...</summary>
    <author>
        <name>Len</name>
        
    </author>
    
        <category term="CLR Hosting" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="ENet" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Socket Servers" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en-us" xml:base="http://www.lenholgate.com/blog/">
        <![CDATA[<p>I've been playing with Visual Studio 2010 Beta 2 and .Net 4.0, building code, running tests, playing with the IDE, etc. The first issue that I've come across with my existing codebase is that the .Net 2.0 hosting APIs (such as <a href="http://msdn.microsoft.com/en-us/library/99sz37yh(VS.100).aspx">CorBindToRuntimeEx</a>)are now deprecated and there's a whole new way of <a href="http://msdn.microsoft.com/en-us/library/dd380850(VS.100).aspx">hosting the CLR</a>.</p>

We've been quite successful in hosting the CLR from <a href="http://www.serverframework.com/products---the-clr-hosting-option.html">within our C++ servers</a>, either to provide servers that support a mix of managed/unmanaged plugins as a pluggable high performance windows application server or to provide network protocol support in C++ (such as ENet) with 'business logic' being written in managed code. The .Net 2.0 hosting API works OK but is not without <a href="http://www.lenholgate.com/archives/000675.html">some annoyances</a>.  Over the next couple of weeks I hope to take a look at the new hosting interface and report here on my findings. With any luck <a href="http://www.serverframework.com/">The&nbsp;Server&nbsp;Framework</a> will support the new hosting interfaces in release 6.2 which currently has no scheduled release date but which I expect will appear early in 2010.]]>
        
    </content>
</entry>

<entry>
    <title>Using Wireshark to debug UDP communication issues</title>
    <link rel="alternate" type="text/html" href="http://www.lenholgate.com/blog/2009/08/using-wireshark-to-debug-udp-communication-issues.html" />
    <id>tag:www.socketframework.com,2009:/blog//12.918</id>

    <published>2009-08-19T10:52:25Z</published>
    <updated>2010-12-29T14:44:26Z</updated>

    <summary>A customer of mine has been having some problems with communication between a UDP server and their load test client. The UDP server implements the ENet protocol which provides for reliable data transfer over UDP. Their problem was manifesting as...</summary>
    <author>
        <name>Len</name>
        
    </author>
    
        <category term="ENet" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Socket Servers" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en-us" xml:base="http://www.lenholgate.com/blog/">
        <![CDATA[<p>A customer of mine has been having some problems with communication between a UDP server and their load test client. The UDP server implements the <a href="http://enet.bespin.org/">ENet protocol</a> which provides for reliable data transfer over UDP. Their problem was manifesting as the client not getting some ENet level ACKs for some reliable data. The <a href="http://www.wireshark.org/">Wireshark</a> log from the client machine showed the client resending the data when the ENet retransmission timeout expired and also showed that the ACKs for these packets never arrived. The communications continued normally until the client disconnected due to a final timeout for the missing ACK.</p>

<p>A quick look at the server source and I could see that this situation should never be able to occur. The test harness for the ENet protocol code also had plenty of tests in place for the correct generation of ACKs and all of these passed OK. The Wireshark log from the client machine showed that the server had obviously processed the packet for which the ACK was missing as the application level response had been sent and we could see that in the log on the client. Application level responses had also been sent for later packets, and our ENet protocol implementation wouldn't have allowed that to happen if the server really hadn't received the missing packet as all of the sequenced packets that were due for delivery after the packet that hadn't been ACKed would have been queued. So it looked like the server had received the packet in question; my code review of the server code showed that the server MUST have generated an ACK for that packet. It seemed like the datagram containing that packet was just being reliably lost...</p>

<p>The first thing to realise when you're debugging network traffic with Wireshark is that your log only contains what the machine that you're logging on is seeing. To really understand what's going on you may need to log in multiple places; usually a log from each end of the connection should be adequate, but on more complex network topologies it's nice to be able to have a log generated on each network segment that you have. Without all of these logs you're only getting part of the picture. The fact that the client machine doesn't receive a particular datagram doesn't necessarily mean that the server never generated it. </p>

Once my customer started taking a Wireshark log from the server machine as well as from the client machine it quickly became clear that the problem wasn't our code. The server log showed the server generating and sending the ACKs that the client log showed as missing. The problem wasn't in the client or the server code but somewhere in the networking infrastructure in between them.]]>
        
    </content>
</entry>

<entry>
    <title>Reliable UDP</title>
    <link rel="alternate" type="text/html" href="http://www.lenholgate.com/blog/2007/06/reliable-udp.html" />
    <id>tag:www.socketframework.com,2007:/blog//12.756</id>

    <published>2007-06-27T21:02:08Z</published>
    <updated>2010-12-27T14:15:30Z</updated>

    <summary>I&apos;ve been doing some work for a client on their reliable UDP implementation. It&apos;s been interesting stuff. They had picked out a &apos;best of breed&apos; open source, reliable UDP protocol implementation (ENet) which was in &apos;C&apos; and integrated it into...</summary>
    <author>
        <name>Len</name>
        
    </author>
    
        <category term="ENet" scheme="http://www.sixapart.com/ns/types#category" />
    
        <category term="Socket Servers" scheme="http://www.sixapart.com/ns/types#category" />
    
    
    <content type="html" xml:lang="en-us" xml:base="http://www.lenholgate.com/blog/">
        <![CDATA[<p>I've been doing some work for a client on their reliable UDP implementation. It's been interesting stuff. They had picked out a 'best of breed' open source, reliable UDP protocol implementation (<a href="http://enet.bespin.org/">ENet</a>) which was in 'C' and integrated it into their server that was written in C++ with my framework. Unfortunately the 'C' API assumed a synchronous 'pull' model for the communications and <a href="http://www.serverframework.com/">The&nbsp;Server&nbsp;Framework</a> gave them an asynchronous 'push' model. They called me in to look at the system and improve the performance. </p>

<p>The work has been interesting and the conversion from the 'C', synchronous API to C++ async has been challenging, but things are nearly complete now and the system seems to run much better than it did before, so I'm happy and they seem happy too. I've done several of these "sync to async" conversions now and I'm getting the hang of it. Async systems are, essentially just big state machines. You push data in and they churn around inside and eventually do stuff and maybe push data out. The <a href="http://www.lenholgate.com/archives/000456.html">OpenSSL asynchronous connector</a> that I wrote was a good starting point, though in that case the library was already quite flexible. In this case we pretty much rewrote the existing library but the resulting code is now much more efficient.</p>

Two things I've realised from all of this are; a) I should investigate a few more reliable UDP implementations as it would be useful to have some options for this built into the framework and b) 'C' sucks... I'm sorry, but, all that declare all the variables at the top of the function stuff is just <b>SO</b> much of a pain when you're trying to write nice clean code...]]>
        
    </content>
</entry>

</feed>



