That Slim Reader/Writer lock issue is a kernel bug

| 0 Comments
It looks like the Slim Reader/Writer issue that I wrote about back in September is a kernel bug after all.

Stefan Boberg has just tweeted to let me know that Microsoft has confirmed it's a bug and that a hot fix is in testing.

WSARecv, WSASend and thread safety

| 0 Comments
Last week I learnt something new, which is always good. Unfortunately it was that for over 15 years I'd been working under a misconception about how an API worked.

Tl;dr When using WSARecv() and WSASend() from multiple threads on a single socket you must ensure that only one thread calls into the API at a given time. Failure to do so can cause buffer corruption.

It all began with this question on StackOverflow and I dived in and gave my usual response. Unfortunately my usual response was wrong and after spending some time talking to the person who had asked the question and running their test code I realised that I've been mistaken about this for a long time.

My view has always been that if you issue multiple WSARecv() calls on a single socket then you need to be aware that the completions may be handled out of sequence by the threads that you have servicing your I/O completion port. This is purely due to thread scheduling and the actual calls to WSARecv() are thread safe in themselves. I wrote about this here back in 2002. My belief was that you could safely call WSARecv() from multiple threads on the same socket at the same time and the only problem would be resequencing the reads once you processed them. Unfortunately this is incorrect as the example code attached to the question shows.

The example code is somewhat contrived for a TCP socket in that it doesn't care about the sequencing of the read completions and it doesn't care about processing the TCP stream out of sequence. It issues multiple WSARecv() calls from multiple threads and the data being sent is simply a series of bytes where the next byte is the value of the current byte plus one and where we wrap back to zero after a 'max value' is reached.

Such a stream with a max value of 7 would look like this: 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x00, 0x01, 0x02, 0x03...

Validating such a TCP stream is as simple as taking any read that completes in any order and simply ensuring that the bytes that are contained in the buffer that has been returned follow the expected pattern. Starting from the first byte, whatever value it is, subsequent bytes must be one greater until the 'max value' is reached at which point the wrap to zero and continue to increment. Assuming my long held beliefs were true then it didn't matter how many threads were issuing WSARecv() calls on the socket at the same time, the resulting slices of the TCP stream should all be valid.

Unfortunately the test program fails to prove this and instead proves that without synchronisation around the WSARecv() call the returned stream slices can be corrupt. That is the data in the buffers returned from the read completion can include bytes that do not follow the expected pattern.

Of course the way that the test program uses the API is of limited use as I can't think of a TCP stream where it would be useful to be able to process the stream in randomly sized chunks with no need to have processed the data that comes before the current data. One of the reasons that I believed that I understood the requirements of the API was that I never used it this way. In systems where multiple reads on TCP streams were allowed I would always increment a sequence number, put the sequence number in the read buffer's meta data and then issue the WSARecv() all as an atomic step with locking around it. This made it possible to correctly sequence the read completions and also had the side effect of preventing multiple threads calling WSARecv() on a single socket at the same time. However, with UDP sockets the broken usage pattern is much more likely and I think that I may have seen the corruption on a client's system in the past - we're testing with fixed code now.

I've yet to fully investigate the WSASend() side of the problem but I'm assuming that the issue is the same. The person who asked the question on StackOverflow has seen data corruption when the receive side is protected by synchronisation and the send side isn't and I've no reason to doubt his analysis. I would like to think that calling WSASend() on one thread and WSARecv(), on another on the same socket, at the same time, is OK, but for now I'm assuming it's not and simply using a single lock per socket around both calls.

The current documentation for WSARecv() has this to say about thread safety: If you are using I/O completion ports, be aware that the order of calls made to WSARecv is also the order in which the buffers are populated. WSARecv should not be called on the same socket simultaneously from different threads, because it can result in an unpredictable buffer order. Which, IMHO, doesn't actually address this issue. It mentions the unpredictable buffer order, which is expected, but not the buffer corruption. Also the documentation for WSASend() has an identical note with "WSARecv" replaced by "WSASend", which, when you think about it, doesn't actually make sense at all. I don't remember seeing these notes when I was first looking at the documentation, but who knows (yes, I'd like a fully preserved set of all revisions to the MSDN docs so that I can diff back to what the docs said when I wrote the code ;) ). Network Programming for Microsoft Windows, Second Edition, doesn't mention any thread safety issues at all in its coverage of WSASend() and WSARecv() as far as I can see from a brief scan.

Apart from the one known case of UDP datagram corruption that may be caused by my misunderstanding I think most of our systems are pretty safe just by their design. However, there will be a fix included in the 6.6.3 release and 7.0 wont be susceptible to this problem at all due to its Activatable Object usage at the socket level.

It's always good to learn something new, more so when it challenges long held beliefs and especially when it proves those beliefs to have been incorrect.
I've been a big fan of Gimpel Lint for years. It's a great static analysis tool for C++ and it can locate all kinds of issues or potential issues in the code. My problem with it has always been that it's a bit of a pig to configure and run, more so if you're used to working inside an IDE all the time. Several years back I had some custom Visual Studio menu items that I'd crufted up that ran Gimpel Lint on a file or a project and some more cruft that converted the output to something clickable in the IDE's output pane. This worked reasonably well but was complicated to maintain and easy to forget about. There was nothing to encourage you to run Gimpel Lint regularly and if you don't do that then the code starts to decay.

Visual Lint, from RiverBlade is, at heart, simply a Visual Studio plugin that integrates other static analysis tools into the Visual Studio IDE. However, the simplicity of the idea belies the value that it provides. Visual Lint makes Gimpel Lint a usable tool within the Visual Studio IDE. For me it has taken a very complex and hardly ever used tool which I always meant to use but never did and turned it into something that I use regularly and that is easy to run.

Gimpel Lint is still complicated to configure and it can be depressing when you first run it on a large codebase but once integrated with Visual Lint and run regularly it becomes a usable and powerful addition to your toolbox.

Now that I have Visual Lint running in my IDE and most of my code abiding by my in-house Gimpel Lint rules I suppose I should look at RiverBlade's LintProject tool and get my build servers to report on Lint breaks...

Two quite different approaches to multi-threading

| 0 Comments

I've been working on some code for a client recently that needs to run in a multi-threaded environment. Unfortunately it was never really written with that requirement in mind, or, more likely, the people who wrote the code knew that it needed to be accessed from multiple threads but didn't really understand quite what that meant. As such I'm doing some fairly hairy refactoring for them. It's high risk as there are no unit tests and the budget doesn't really extend to completing the work, let alone "spending extra time" writing unit tests... The code is written in quite a 'C' style, most things are simple structs and most data is public and as such member functions fall where they are most conveniently written. Adding correct locking is complicated due to lots of mutable global data, lack of encapsulation and the fact that, at its heart, it's multi-user server software where one user gets to poke at all the others.


Step zero was to realise that no locking is bad but correct locking was a long way away so I slipped in a single "global lock" to make the system work. It's stable and not too slow but it doesn't really ever use more than one thread at a time as almost everything requires that the executing thread holds the "global lock".


Step one is to add some encapsulation and move things around so that we can one day dream of adding in more fine grained locking - unfortunately this step is a big one.


I'm finding it interesting comparing this work to the work I was doing immediately before. I've been working on the next big release of The Server Framework and this release will put an Activatable Object at the heart of each connection. This removes the need for locking from the socket object and lots of stuff that's built on top of it. It means that masses of complex code can be pulled out and thrown away. It's not a new idea, the socket object is now simply driven by a message queue. Locks are needed to put work items into the queue but during processing we can run without locking, safe in the knowledge that only one thread can ever be processing at a time.


The problem is that making the kind of change that I'm making right now is a massive change. It has led to a complete rewrite of the entire TCP side of The Server Framework. The general pattern is removing locking, because it's no longer needed and simplifying code because the ways in which it can be called are simpler. The unit tests that I had for the previous version of the code have helped to guide me; yes new tests have been written, but the old ones told me what I needed to test even if they needed to be changed a lot to work with the new code.


With The Server Framework work it was scary but the various layers of tests supported the work and, in general, the code was being made simpler and easier to understand. With the client work there are no tests, huge amounts of the application simply didn't compile until the changes were complete and, whilst some things are being simplified, others are being made more complex to accommodate changes to make the code thread safe. I expect that with enough iterations the client's code could shift to using Activatable objects and some of the locking would go away again but I'm not sure we'll have the luxury of those extra iterations.


How to build a GCC Cross-Compiler

| 0 Comments
This article over on Preshing on Programming looks useful. It gives a step by step guide for building GCC cross-compilers, I expect it will save me lots of time at some point in the future.

Dropping support for Visual Studio 2005 and 2008

| 0 Comments
The 7.0 release of The Server Framework, which is likely to be released early next year, will no longer support Visual Studio 2005 or Visual Studio 2008.

The 6.6.x releases will be the last to support these compilers.

Please get in touch immediately if this will be a problem for you.

Dropping support for Windows XP and Server 2003

| 2 Comments
The 7.0 release of The Server Framework, which is likely to be released early next year, will no longer support Windows XP or Windows Server 2003.

The 6.6.x releases will be the last to support these operating systems. Release 6.6.3, is due shortly and is a minor bug fixing release. We may release subsequent 6.6.x bug fix releases but no new development will occur on the 6.6 branch.

Removal of support for these operating systems allows us to clean up the code considerably and to remove lots of code that's required purely to work around 'interesting' twists in various Windows APIs pre-Vista.

New option pack: Streaming Media

| 0 Comments
We have a new Option Pack, The Streaming Media Option Pack. This allows you to easily add streaming of H.264 and MPEG audio and video to your clients and servers using RTSP, RTP and RTCP.

With more and more Internet Of Things devices supporting rich media streaming for remote monitoring it's becoming essential to have the ability to manage these media streams within your device management servers and clients. Whether it's recording device streams for later analysis or arbitrating between multiple clients and devices, manipulating streaming media is becoming more and more important.

As always, this Option Pack integrates seamlessly with The Server Framework's Core Framework and other options and allows you to quickly and easily add rich media support.

Surprising Slim Reader/Writer Lock thread exit issues.

| 1 Comment

I've been noticing a strange thing for a while on Windows 8/8.1 and the equivalent server versions. The issue occurs when I'm using a Slim Reader/Writer Lock (SRWL) exclusively in exclusive mode (as a replacement for critical sections). What happens is, when a thread that has just unlocked a SRWL exits cleanly, immediately after unlocking the lock, sometimes threads that are waiting on the lock do not get woken and none of them acquire the lock.

At first I spent ages thinking that this was some kind of subtle bug in my LockExplorer tool as initially the problem only manifested itself during test runs that were using LockExplorer to detect potential deadlocks. Recently, however, I've been seeing the identical problem in normal programs being run with no clever lock instrumentation going on.

I still think it's more likely my bug than the operating system's but I recently found this Knowledge Base article, #2582203, "A process that is being terminated stops responding in Windows 7 or in Windows Server 2008 R2" which says "This issue occurs because the main thread that terminates the process tries to reactivate another thread that was terminated when the thread released an SRW lock. This race condition causes the process to stop responding.". This sounds suspiciously like the problem that I'm seeing, though not exactly.

The problem I see is that other threads waiting on the SWRL don't get notified that the lock is now unlocked. This locks my program up during shutdown purely because I have code that waits for these threads to exit cleanly and they can't as they're waiting on an unlocked SRWL. It's not a deadlocked attempt at recursive SWRL acquisition as I have debug code in my wrapper class which detects such behaviour. It's not an orphaned locked SRWL as breaking the hung process into the debugger and immediately continuing it causes one of the threads that's blocking on the lock to immediately acquire it and continue.

I haven't applied the hotfix yet, partly because my systems are Win 8 rather than Win 7 and partly because I'm not yet convinced that this is the issue.

As a work around I've added a call to Sleep at the point where my thread class allows a thread to exit. This seems to reduce the instances of the problem, which leads me to believe it's a race condition of some kind.

New client profile: Eonic Gaming - Turf Battles

| 0 Comments
We have a new client profile available here for a new client who is using The Server Framework to power their MMORPG game server.

About this Blog

I usually write about C++ development on Windows platforms, but I often ramble on about other less technical stuff...

This page contains recent content. Look in the archives to find all content.

I have other blogs...

Subscribe to feed The Server Framework - high performance server development
Subscribe to feed Lock Explorer - deadlock detection and multi-threaded performance tools
Subscribe to feed l'Hexapod - embedded electronics and robotics
Subscribe to feed MegèveSki - skiing