Bug hunting


I've just spent a day tracking down a bug in a pending release of The Server Framework. It was an interesting, and actually quite enjoyable, journey but one that I shouldn't have had to make. The bug was due to a Windows API call being inserted between the previous API call and the call to GetLastError() to retrieve the error code on failure. The new API call overwrote the previous error value and this confused the error handling code for the previous API call. This was a bit of a "school boy error" and it was made worse by the fact that the code in question had a comment which clearly explained why nothing should be placed between the API call and the call to GetLastError().

Back in September I mentioned that I had found a problem with my usage of Slim reader/writer locks. I expected this to be something that I was doing wrong but it turned out that it was a kernel bug.

This morning Johan Torp tweeted that a hotfix for this issue is now available.

The note on the hotfix states: "Note These issues are not obvious and are difficult to debug. This update is intended for applicable systems to apply proactively in order to increase the reliability of these systems.", so go install it even if you don't know that you're having these problems right now...

Video streaming from IoT devices


As I mentioned back in February I've been working on a custom video streaming solution for one of my clients. This has been a lot of fun. It was a nicely defined spec using industry standard protocols developed on a fixed price basis with lots of TDD, what's not to like.

The system allows a controlling application to set up RTSP streams for broadcasting to multiple clients. The data for the stream is being transmitted live from one of thousands of smart (IoT) devices and the server buffers this and broadcasts it using RTSP to publish the RTP streams. The controller can also get the server to record these live streams to disk and play them back via an RTSP stream at a later date.

All of the streams are dynamic and the controlling app is responsible for making sure that the IoT devices know where to send their raw video feeds and also for communicating the RTSP urls that can be used to access the streams.

Although this system was written using our new Streaming Media Option Pack there was a lot of scope for new code to be written. Up until this point the option pack revolved around playing pre-recorded data from files. Broadcasting and recording live video required a bit more thought.

This system allows for thousands of concurrent streams and allows multiple users to view what otherwise would be a single raw H264 stream from the IoT device. The device can stay reasonably simple and the server system can add the complexity required for broadcasting and conforming with the required specs.

To enable development without needing access to the custom hardware that's under development I put together a simple app which chops up a video file and sends it in the format that the IoT devices will use, this massively simplified testing. Given the amount of TDD that was being done on the actual code the integration testing with the mock IoT device was pretty painless but it made it possible to test the server under load and also allowed for a simple demo system to be put together.

I'd forgotten quite how much I enjoy these 'perfect' development projects where the customer knows what they want and can communicate it clearly, the schedule is sensible, the challenge is taxing but not impossible and the price is right. Throw in the comfort of being able to 'waste' as much time as I like with TDD and the end result is a project which actually ends up being quite a calming and meditative experience.

And now back to the real world...

New IoT fixed-price development project

We're pleased to be working with one of our secretive security company clients again to provide a custom video streaming server for them. The server will enable them to record and stream live video from their network of "internet of things" connected devices.

As with all M2M and IoT projects this system needs to be able to scale to many thousands of concurrently connected devices and we're pleased to be able to use The Server Framework's new Streaming Media Option pack to be able to provide a complete solution quickly and cost effectively for our client.

That Slim Reader/Writer lock issue is a kernel bug

It looks like the Slim Reader/Writer issue that I wrote about back in September is a kernel bug after all.

Stefan Boberg has just tweeted to let me know that Microsoft has confirmed it's a bug and that a hot fix is in testing.

WSARecv, WSASend and thread safety

Last week I learnt something new, which is always good. Unfortunately it was that for over 15 years I'd been working under a misconception about how an API worked.

Tl;dr When using WSARecv() and WSASend() from multiple threads on a single socket you must ensure that only one thread calls into the API at a given time. Failure to do so can cause buffer corruption.

It all began with this question on StackOverflow and I dived in and gave my usual response. Unfortunately my usual response was wrong and after spending some time talking to the person who had asked the question and running their test code I realised that I've been mistaken about this for a long time.

My view has always been that if you issue multiple WSARecv() calls on a single socket then you need to be aware that the completions may be handled out of sequence by the threads that you have servicing your I/O completion port. This is purely due to thread scheduling and the actual calls to WSARecv() are thread safe in themselves. I wrote about this here back in 2002. My belief was that you could safely call WSARecv() from multiple threads on the same socket at the same time and the only problem would be resequencing the reads once you processed them. Unfortunately this is incorrect as the example code attached to the question shows.

The example code is somewhat contrived for a TCP socket in that it doesn't care about the sequencing of the read completions and it doesn't care about processing the TCP stream out of sequence. It issues multiple WSARecv() calls from multiple threads and the data being sent is simply a series of bytes where the next byte is the value of the current byte plus one and where we wrap back to zero after a 'max value' is reached.

Such a stream with a max value of 7 would look like this: 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x00, 0x01, 0x02, 0x03...

Validating such a TCP stream is as simple as taking any read that completes in any order and simply ensuring that the bytes that are contained in the buffer that has been returned follow the expected pattern. Starting from the first byte, whatever value it is, subsequent bytes must be one greater until the 'max value' is reached at which point the wrap to zero and continue to increment. Assuming my long held beliefs were true then it didn't matter how many threads were issuing WSARecv() calls on the socket at the same time, the resulting slices of the TCP stream should all be valid.

Unfortunately the test program fails to prove this and instead proves that without synchronisation around the WSARecv() call the returned stream slices can be corrupt. That is the data in the buffers returned from the read completion can include bytes that do not follow the expected pattern.

Of course the way that the test program uses the API is of limited use as I can't think of a TCP stream where it would be useful to be able to process the stream in randomly sized chunks with no need to have processed the data that comes before the current data. One of the reasons that I believed that I understood the requirements of the API was that I never used it this way. In systems where multiple reads on TCP streams were allowed I would always increment a sequence number, put the sequence number in the read buffer's meta data and then issue the WSARecv() all as an atomic step with locking around it. This made it possible to correctly sequence the read completions and also had the side effect of preventing multiple threads calling WSARecv() on a single socket at the same time. However, with UDP sockets the broken usage pattern is much more likely and I think that I may have seen the corruption on a client's system in the past - we're testing with fixed code now.

I've yet to fully investigate the WSASend() side of the problem but I'm assuming that the issue is the same. The person who asked the question on StackOverflow has seen data corruption when the receive side is protected by synchronisation and the send side isn't and I've no reason to doubt his analysis. I would like to think that calling WSASend() on one thread and WSARecv(), on another on the same socket, at the same time, is OK, but for now I'm assuming it's not and simply using a single lock per socket around both calls.

The current documentation for WSARecv() has this to say about thread safety: If you are using I/O completion ports, be aware that the order of calls made to WSARecv is also the order in which the buffers are populated. WSARecv should not be called on the same socket simultaneously from different threads, because it can result in an unpredictable buffer order. Which, IMHO, doesn't actually address this issue. It mentions the unpredictable buffer order, which is expected, but not the buffer corruption. Also the documentation for WSASend() has an identical note with "WSARecv" replaced by "WSASend", which, when you think about it, doesn't actually make sense at all. I don't remember seeing these notes when I was first looking at the documentation, but who knows (yes, I'd like a fully preserved set of all revisions to the MSDN docs so that I can diff back to what the docs said when I wrote the code ;) ). Network Programming for Microsoft Windows, Second Edition, doesn't mention any thread safety issues at all in its coverage of WSASend() and WSARecv() as far as I can see from a brief scan.

Apart from the one known case of UDP datagram corruption that may be caused by my misunderstanding I think most of our systems are pretty safe just by their design. However, there will be a fix included in the 6.6.3 release and 7.0 wont be susceptible to this problem at all due to its Activatable Object usage at the socket level.

It's always good to learn something new, more so when it challenges long held beliefs and especially when it proves those beliefs to have been incorrect.
I've been a big fan of Gimpel Lint for years. It's a great static analysis tool for C++ and it can locate all kinds of issues or potential issues in the code. My problem with it has always been that it's a bit of a pig to configure and run, more so if you're used to working inside an IDE all the time. Several years back I had some custom Visual Studio menu items that I'd crufted up that ran Gimpel Lint on a file or a project and some more cruft that converted the output to something clickable in the IDE's output pane. This worked reasonably well but was complicated to maintain and easy to forget about. There was nothing to encourage you to run Gimpel Lint regularly and if you don't do that then the code starts to decay.

Visual Lint, from RiverBlade is, at heart, simply a Visual Studio plugin that integrates other static analysis tools into the Visual Studio IDE. However, the simplicity of the idea belies the value that it provides. Visual Lint makes Gimpel Lint a usable tool within the Visual Studio IDE. For me it has taken a very complex and hardly ever used tool which I always meant to use but never did and turned it into something that I use regularly and that is easy to run.

Gimpel Lint is still complicated to configure and it can be depressing when you first run it on a large codebase but once integrated with Visual Lint and run regularly it becomes a usable and powerful addition to your toolbox.

Now that I have Visual Lint running in my IDE and most of my code abiding by my in-house Gimpel Lint rules I suppose I should look at RiverBlade's LintProject tool and get my build servers to report on Lint breaks...

Two quite different approaches to multi-threading


I've been working on some code for a client recently that needs to run in a multi-threaded environment. Unfortunately it was never really written with that requirement in mind, or, more likely, the people who wrote the code knew that it needed to be accessed from multiple threads but didn't really understand quite what that meant. As such I'm doing some fairly hairy refactoring for them. It's high risk as there are no unit tests and the budget doesn't really extend to completing the work, let alone "spending extra time" writing unit tests... The code is written in quite a 'C' style, most things are simple structs and most data is public and as such member functions fall where they are most conveniently written. Adding correct locking is complicated due to lots of mutable global data, lack of encapsulation and the fact that, at its heart, it's multi-user server software where one user gets to poke at all the others.

Step zero was to realise that no locking is bad but correct locking was a long way away so I slipped in a single "global lock" to make the system work. It's stable and not too slow but it doesn't really ever use more than one thread at a time as almost everything requires that the executing thread holds the "global lock".

Step one is to add some encapsulation and move things around so that we can one day dream of adding in more fine grained locking - unfortunately this step is a big one.

I'm finding it interesting comparing this work to the work I was doing immediately before. I've been working on the next big release of The Server Framework and this release will put an Activatable Object at the heart of each connection. This removes the need for locking from the socket object and lots of stuff that's built on top of it. It means that masses of complex code can be pulled out and thrown away. It's not a new idea, the socket object is now simply driven by a message queue. Locks are needed to put work items into the queue but during processing we can run without locking, safe in the knowledge that only one thread can ever be processing at a time.

The problem is that making the kind of change that I'm making right now is a massive change. It has led to a complete rewrite of the entire TCP side of The Server Framework. The general pattern is removing locking, because it's no longer needed and simplifying code because the ways in which it can be called are simpler. The unit tests that I had for the previous version of the code have helped to guide me; yes new tests have been written, but the old ones told me what I needed to test even if they needed to be changed a lot to work with the new code.

With The Server Framework work it was scary but the various layers of tests supported the work and, in general, the code was being made simpler and easier to understand. With the client work there are no tests, huge amounts of the application simply didn't compile until the changes were complete and, whilst some things are being simplified, others are being made more complex to accommodate changes to make the code thread safe. I expect that with enough iterations the client's code could shift to using Activatable objects and some of the locking would go away again but I'm not sure we'll have the luxury of those extra iterations.

How to build a GCC Cross-Compiler

This article over on Preshing on Programming looks useful. It gives a step by step guide for building GCC cross-compilers, I expect it will save me lots of time at some point in the future.

Dropping support for Visual Studio 2005 and 2008

The 7.0 release of The Server Framework, which is likely to be released early next year, will no longer support Visual Studio 2005 or Visual Studio 2008.

The 6.6.x releases will be the last to support these compilers.

Please get in touch immediately if this will be a problem for you.

About this Blog

I usually write about C++ development on Windows platforms, but I often ramble on about other less technical stuff...

This page contains recent content. Look in the archives to find all content.

I have other blogs...

Subscribe to feed The Server Framework - high performance server development
Subscribe to feed Lock Explorer - deadlock detection and multi-threaded performance tools
Subscribe to feed l'Hexapod - embedded electronics and robotics
Subscribe to feed MegèveSki - skiing