WebRTC, TLS hardening and Scalable game servers

This year is proving to be yet another busy one for us. We've continued to work with Eonic Gaming on their servers for the Turf Battles Triumphus 3D MMORPG and we have done quite a bit of work with various clients regarding hardening their TLS servers. The main focus though has been digging into WebRTC data channels.

The WebRTC work is nice, though fairly complex. It's based on lots of RFCs and the initial learning curve was pretty steep. WebRTC data channels, in themselves, are pretty simple, but they're built on a huge stack of other technologies: SCTP, DTLS, ICE, STUN, etc. Getting all of that working to the point where we get a simple "hello world" message from browser into the server and back has taken some time. There's lots of open source stuff out there but much of it needs a lot of work to understand fully as documentation, comments and naming often leaves a lot to be desired.

We're looking at producing a highly scalable implementation of WebRTC data channels from the ground up for a client who wants to add this kind of connectivity to their application servers; WebRTC is often used for peer to peer connections where both peers are browsers, our system has the server as one of the peers. Right now we're doing a custom implementation specifically for this client but eventually we expect to produce an Option Pack with a slightly more general purpose WebRTC implementation.

As always we've also been working on several new releases of The Server Framework, a 6.8.1 maintenance release and continuing work on what could become massively modernised 7.0 release.

How to determine if a non-IFS LSP is installed


Enabling FILE_SKIP_COMPLETION_PORT_ON_SUCCESS on a handle associated with an I/O completion port can improve performance and reduce context switching by allowing the thread that calls the API that can complete asynchronously to handle the completion "inline" if the call can complete synchronously. This is especially useful for TCP reads when there's already data in the network stack's buffers, or writes when there is space in the buffers. Whilst there are design issues that must be taken into consideration before simply enabling this flag (beware recursion!) there's a little known issue where code outside of your control can prevent the IOCP from operating correctly when this flag is enabled.

If non-IFS Winsock Base Service Providers (BSPs) or Layered Service Providers (LSPs) are installed then you may not receive completions at all for handles with the flag set.

This Microsoft Knowledge Base article has been around for quite a few years and it's important and possibly becoming more so as more and more poorly written LSPs get installed by adware and other rubbish.

It's all very well knowing that you can't use FILE_SKIP_COMPLETION_PORT_ON_SUCCESS when you have a non-IFS LSP installed but how can you tell at runtime that this is the case?

We've had this code in The Server Framework since 2011 or so: it iterates the Winsock catalog and lets you know if any non-IFS LSPs are installed. You can then use the results of this to determine if it's safe to enable FILE_SKIP_COMPLETION_PORT_ON_SUCCESS or not.

static const int s_protocols[] = { IPPROTO_TCP, IPPROTO_UDP, 0 };

bool CanEnableSkipCompletionPortOnSuccess()
   // At some point we MAY want to check for UDP and TCP transports separately,
   // if we do that then we need to change this.

   TExpandableBuffer<BYTE> buffer;

   LPWSAPROTOCOL_INFOW pProtocolInfo = nullptr;

   DWORD bufferLength = 0;

   int error = 0;

   int numEntries = ::WSCEnumProtocols(
      const_cast<int *>(&s_protocols[0]),

   if (SOCKET_ERROR != numEntries)
      throw CException(
         _T("Expected first call to fail and return buffer size!"));

   int attempts = 0;

   bool done = false;

   while (!done)
      if (error != WSAENOBUFS)
         throw CWin32Exception(

      if (attempts++ > 3)
         // so the amount of memory required is always changing??

         throw CException(
            _T("Cannot allocate appropriate buffer: ") +


      pProtocolInfo = reinterpret_cast<LPWSAPROTOCOL_INFOW>(buffer.GetBuffer());

      numEntries = ::WSCEnumProtocols(
         const_cast<int *>(&s_protocols[0]),

      done = (SOCKET_ERROR != numEntries);

   bool ok = true;

   for (int i = 0; ok && i < numEntries; ++i)
      ok = ((pProtocolInfo[i].dwServiceFlags1 & XP1_IFS_HANDLES) == XP1_IFS_HANDLES);

   return ok;

Latest release of The Server Framework: 6.8

Version 6.8 of The Server Framework was released today.

This release includes important bug fixes, see here. It also includes lots of code change due to: the removal of support for Visual Studio 2010, adding support for Visual Studio 2017 and the results of lots of static analysis.

This release is essential for users of Release 6.7.

Bug in multi-buffer writes in 6.7


A bug has been discovered in Release 6.7 in the code that deals with TCP socket writes that involve more than a single buffer. These 'multi-buffer writes' are writes that involves either a buffer chain or a block of data passed as a pointer and a length where the length exceeds the size of the buffer allocator that the connection is using.

The bug prevents the 'multi-buffer write' from being executed as a single atomic write at the network layer and so can cause corruption of a TCP data stream if multiple sockets are writing to the same connection concurrently.

The bug is due to the removal in 6.7 of the code required to support Windows XP. In Windows XP we needed to include sequence numbers in write operations to allow for the way we always marshalled all I/O operations from the calling thread to an I/O thread to prevent I/O cancellation due to thread termination. This write sequencing code had the side effect of also protecting 'multi-buffer writes' from being interrupted by other writes.

The fix does not require the reintroduction of write sequencing but, instead, issues a single scatter/gather style write for the entire buffer chain. This is both efficient and correct.

A related bug also affects atomicity of 'multi-buffer writes' into filter layers, such as the SSL code. Similar fixes have been applied here.

The bug is fixed in Release 6.8 which will be released later today.

C++ Tools - JetBrains ReSharper C++ - purchased...


I've been looking at Resharper C++ by JetBrains for a while now and the trial period has finally run out. I immediately bought a license which shows how my feelings have changed about the product during the trial.

C++ Tools - CppDepend


I've been trying various static analysis tools on the C++ code of The Server Framework. So far I'm using Resharper C++ and the Gimpel PC-Lint Plus Beta on a regular basis and I've now added CppDepend into the loop.

Full disclosure. I have been given a CppDepend license.

As I've said before, whilst CppDepend is easy to get hold of, easy to install and "just works" I don't find it that useful. I can certainly remember large enterprise clients where this kind of tool would be invaluable for management level analysis of large codebases but for a small development team of competent people it's less immediately useful. That said, I've found several warnings that it produces to be helpful and so I've been running it alongside the other tools as it fills some gaps.

Since I'm in the process of dropping support for several old compilers I can finally begin to move the codebase forward to slightly more modern C++. All of the tools help with this and CppDepend has some 'modernise C++' checks that I'm finding useful.

I like the fact that I can run CppDepend as a stand alone GUI. I prefer this method to using fully integrated Visual Studio extensions.

I like the idea of the regression reports but haven't actually set up a baseline report and run them...

I'm not sure that I would purchase a license for this tool but I know clients that could benefit from using it.

Previously on "Practical Testing"... Having just resurrected this series of blog posts about testing a non-trivial piece of real-world C++ code I've fixed a few bugs and done a bit of refactoring. There's one more step required to bring the code in this article right up to date.

The timer queue that is the focus of these blog posts is part of The Server Framework. This body of code has been around since 2001 and has evolved to support new platforms and compilers. One of the things that I do from time to time is remove support for old platforms and compilers. This allows me to start using exciting new C++ features (around 10 years after they were first 'new') and it means that code that is present just to work around issues in old platforms can be removed. The timer queue has quite a bit of code that needn't be there now that Windows XP has passed away.

In 2008, in episode 17, I added support for GetTickCount64() which removed some of the complexity from timer management. We kept the old version around as you needed to have Windows Vista or later to use that API call. Now that I no longer support Windows XP every supported platform has GetTickCount64() and there's no reason to support the XP version. Removing that removes quite a bit of complexity from both the internals and the public API; you no longer have to decide which version to use as there is only one!

As the recent bug fixes have shown, there are also two versions of timer dispatch. The original version which holds a lock whilst calling into user code during timer dispatch and the improved version that doesn't. Holding locks whilst calling back into user code via callback interfaces is a surefire way to create lock inversions and deadlock. The old dispatch method was originally kept because I was unsure of the performance characteristics of the new way. The improved version was added in 2008 in episode 18 and every piece of development that I've done since has used the new dispatch method. The performance difference, if present, is not relevant and the removal of the ability for code to take part in lock inversions is important for library code. So the original method of dispatch should be removed.

The removal of this functionality massively simplifies the code and the tests. 60 tests can be removed which is almost a third. Most of the code that was moved into the common base class for the two versions of the queues can be moved into the 'Ex' version with GetTickCount64() support. I expect that I should, eventually, rename the CCallbackTimerQueueEx class to CallbackTimerQueue, but not yet. There are some systems that use this class explicitly rather than via CThreadedCallbackTimerQueue and the enum that we're removing in this release. It will be easier to introduce another breaking change in a future release as it means that fixing the breaking changes due to this release is slightly easier and more obvious and the fix when I do change the name will be a simple rename in the code that uses the old class...

The code is here on GitHub and new rules apply.

I've been looking at Resharper C++ by JetBrains for a while now and I expect I'm nearing the end of the trial period. Initially I found it got in my way but slowly I think it's training me to ignore the niggles and I'm finding the functionality quite compelling.

Practical Testing: 36 - Timeout handle wrap


Previously on "Practical Testing"... I've just fixed a new bug in the timer queue and in doing so I updated the code used in the blog posts to the latest version that ships with The Server Framework. This code included two fixes that had been fixed some time ago and which I hadn't documented here. They also lacked unit tests... Last time, I wrote tests for, and fixed, the first bug. This time I fix the final bug.

This bug is in the "lock-free" timeout handling code and it would cause the threaded version of the timer queue to do a small amount of unnecessary work but otherwise work correctly. The problem is that we use a ULONG_PTR value as an opaque handle when processing timeouts using the "lock-free" BeginTimeoutHandling(), HandleTimeout(), EndTimeoutHandling() sequence and this opaque handle has a sentinel value of 0 that is used to indicate that there are no timeouts to handle. The code that we use to generate the timeout handle in BeginTimeoutHandling() is a simple ::InterlockedIncrement() and, if enough timeout handles are generated, this will wrap to 0 and return the sentinel value when it should be returning a non-sentinel value.

Practical Testing: 35 - Heap corruption


Previously on "Practical Testing"... I've just fixed a new bug in the timer queue and in doing so I updated the code used in the blog posts to the latest version that ships with The Server Framework. This code included two fixes that had been fixed some time ago and which I hadn't documented here. They also lacked unit tests... In this episode I find and fix the first of these issues by writing a unit test that triggers the issue.

This bug is another edge case that isn't used that often by most of the code that uses the timer queue. The queue supports timers that can be set and reset a number of times via an interface that allows you to create a timer, set/cancel it and then destroy the timer. A less used interface allows you to create 'fire and forget' timers that can only be set once and that clean themselves up. Hardly any code that I write uses this interface but it's there for backwards compatibility and the code required to support it is limited to the call that sets the initial timer, as the cleanup is done by code that's shared with timers that are deleted during their own timeout processing.

This bug also only affects the Timer Wheel implementation which has a considerably smaller set of users and a considerably narrower use case. There's test coverage for "one shot" timers but only for timers that are processed whilst the timer queue's internal lock is held. There is no test for a "one shot" timer for queues that dispatch without holding a lock. The code for lock-free dispatch is significantly different to the code for dispatch whilst holding a lock and that's where the bug is.

The bug was originally found because it causes heap corruption due to a double delete. The first thing I'll do is add a test for lock-free timer dispatch of "one shot" timers. This clearly demonstrates the bug when run in release mode on Windows 10 as the heap alerts us to the problem.

About this Blog

I usually write about C++ development on Windows platforms, but I often ramble on about other less technical stuff...

This page contains recent content. Look in the archives to find all content.

I have other blogs...

Subscribe to feed The Server Framework - high performance server development
Subscribe to feed Lock Explorer - deadlock detection and multi-threaded performance tools
Subscribe to feed l'Hexapod - embedded electronics and robotics
Subscribe to feed MegèveSki - skiing