As usual I have used VMs to play with the CTP releases for the new version of Visual Studio and fixed up new warnings and build failures in my source so that I'm not too surprised when the RTM comes along.

Unfortunately as soon as I put VS2015 RTM on my build server and started running it as part of my CI system I started noticing strange things...

The first is that it seems to be possible to get VS2015 into a state where it will refuse to load C++ projects. I've no idea how I've managed this, but it's now happened twice. Rebooting the machine seems to fix things. The log file that the failure indicates might help shows that it's trying to load a the "Visual Studio C++ Project System Package" and it's getting an E_POINTER result...

The second issue seems to be a race condition in shutting down the new process that's used to help various parts of VS communicate with each other, VSHub.exe. My CI system will fire up a Visual Studio instance to build a project using /build and once that completes another project is likely to get built, sometimes two VS2015 builds will follow one another, other times there are other versions of Visual Studio building in between. VS 2015 appears to start an instance of VSHub.exe but only if one isn't running (or, more likely, it always starts one but if one's already running then the new ones shut down). Then when the last instance of VS quits VSHub.exe hangs around for a moment and then shuts down... It seems to crash if it's shutting down when a new instance of VS2015 is started (and, presumably decides that the VSHub.exe that is shutting down is the one to use). A fix for this issue is simply to keep VSHub.exe alive for the entire time that all VS2015 builds need to run and I can do this by having an instance of VS2015 open on the build machine whilst the CI system is running...

Thanks to Abe10, in this thread, for the workaround for the VSHub.exe issue.

6 out of 10 for this VS release Microsoft, try harder. I hope the Windows 10 team have better QA.

I now have a real solution to the problem that I outlined on Saturday night and this solution, unlike the one developed during my "Macallan driven development" session on Saturday evening actually works.

The problem is that when using code to run build using VS2010 by running a project into devenv.com I get the following error message:

C:\Program Files (x86)\MSBuild\Microsoft\VisualStudio\v12.0\CodeAnalysis\Microsoft.CodeAnalysis.targets(214,5): error MSB4175: The task factory "CodeTaskFactory" could not be loaded from the assembly "C:\Windows\Microsoft.NET\Framework\v4.0.30319\Microsoft.Build.Tasks.v12.0.dll". Could not load file or assembly 'file:///C:\Windows\Microsoft.NET\Framework\v4.0.30319\Microsoft.Build.Tasks.v12.0.dll' or one of its dependencies. The system cannot find the file specified.

The project builds correctly from the command window using the exact same command line as I use in my calls to CreateProcess().

I was on the right track on Saturday evening. It is an environment issue, just not the one that I thought it was. I also now understand why I thought I'd solved the problem on Saturday night only to discover that the fix didn't work when I checked it again on Sunday morning.

The problem is caused by the presence of "VisualStudioVersion" in the environment. This variable is set to 12.0 when my task runner program is run from within VS2013 either when debugging or running without the debugger attached but launching from within VS2013. The presence of this variable in the task runner's environment somehow causes VS2010 to use the latest version of MSBuild that ships with VS2013. When doing this MSBuild gets confused about which .Net framework directory to load assemblies from and fails to locate its dll. Removing the "VisualStudioVersion" from the inherited environment before launching VS2010 programatically fixes the problem.

Running my task running from a normal command prompt, rather than from inside of VS2013, allows it to work correctly without needing to remove the variable (as the variable is only added by VS2013 itself). I expect that I tested like this on Saturday night and that's what made me think I'd solved the problem them.

VS2013 actually adds several "VisualStudio" variables to the environment and I now remove all of them before running any task from my task runner.

Updated: 12 July 2015 - It seems that this is all incorrect... Upon running my tests again this morning I find that the x64 task runner also fails to run VS2010 correctly and so the environment differences are unlikely to be the cause...

So, having decided that my continuous integration system could be better, and having looked at JetBrains' TeamCity and decided that it doesn't quite fit I'm looking back at some code that I wrote back in 2008 when I last thought about this kind of thing...

I have some code which works with trees of dependant tasks and triggers sub tasks once the tasks they depend on complete successfully. The code uses CreateProcess() and Win32 Jobs to control the tasks and it can run builds and tests using the various flavours of Visual Studio installed on the box in question.

So far, so good as a starting point for an agent that can run multiple builds on a machine...

My problem, well, one of them, being that when using VS2010 by running a project into devenv.com I get the following error message:

C:\Program Files (x86)\MSBuild\Microsoft\VisualStudio\v12.0\CodeAnalysis\Microsoft.CodeAnalysis.targets(214,5): error MSB4175: The task factory "CodeTaskFactory" could not be loaded from the assembly "C:\Windows\Microsoft.NET\Framework\v4.0.30319\Microsoft.Build.Tasks.v12.0.dll". Could not load file or assembly 'file:///C:\Windows\Microsoft.NET\Framework\v4.0.30319\Microsoft.Build.Tasks.v12.0.dll' or one of its dependencies. The system cannot find the file specified.

This is annoying, to say the least, VS2005, 2008 and 2012 and 2013 all work fine, it's just 2010 that fails. I googled the issue and the general consensus is that once you've installed VS2013 a different version of MSBuild is used. Of course that doesn't answer the question of why it 'works on my machine' except when my task runner is running the command...

Eventually I got the task runner to dump the environment and did the same from the command prompt where the command worked. The difference being that the command prompt that worked had an 'x64' environment and the task runner had a different, 'x86/win32' environment... This was because the task runner was a Win32 process. Running the task runner as x64 fixed the issue. So there's probably a bug in the VS2013 installation code that fails to set some environment variables for the Win32 command prompt that are required once it's "screwed with" the existing MSBuild installation on the box...

I'm expecting that manually adding the missing bits to the 'x86' environment will mean that the task runner will work as a Win32 build AND an x64 build...

Bug hunting

| 0 Comments

I've just spent a day tracking down a bug in a pending release of The Server Framework. It was an interesting, and actually quite enjoyable, journey but one that I shouldn't have had to make. The bug was due to a Windows API call being inserted between the previous API call and the call to GetLastError() to retrieve the error code on failure. The new API call overwrote the previous error value and this confused the error handling code for the previous API call. This was a bit of a "school boy error" and it was made worse by the fact that the code in question had a comment which clearly explained why nothing should be placed between the API call and the call to GetLastError().

Back in September I mentioned that I had found a problem with my usage of Slim reader/writer locks. I expected this to be something that I was doing wrong but it turned out that it was a kernel bug.

This morning Johan Torp tweeted that a hotfix for this issue is now available.

The note on the hotfix states: "Note These issues are not obvious and are difficult to debug. This update is intended for applicable systems to apply proactively in order to increase the reliability of these systems.", so go install it even if you don't know that you're having these problems right now...

Video streaming from IoT devices

| 0 Comments

As I mentioned back in February I've been working on a custom video streaming solution for one of my clients. This has been a lot of fun. It was a nicely defined spec using industry standard protocols developed on a fixed price basis with lots of TDD, what's not to like.

The system allows a controlling application to set up RTSP streams for broadcasting to multiple clients. The data for the stream is being transmitted live from one of thousands of smart (IoT) devices and the server buffers this and broadcasts it using RTSP to publish the RTP streams. The controller can also get the server to record these live streams to disk and play them back via an RTSP stream at a later date.

All of the streams are dynamic and the controlling app is responsible for making sure that the IoT devices know where to send their raw video feeds and also for communicating the RTSP urls that can be used to access the streams.

Although this system was written using our new Streaming Media Option Pack there was a lot of scope for new code to be written. Up until this point the option pack revolved around playing pre-recorded data from files. Broadcasting and recording live video required a bit more thought.

This system allows for thousands of concurrent streams and allows multiple users to view what otherwise would be a single raw H264 stream from the IoT device. The device can stay reasonably simple and the server system can add the complexity required for broadcasting and conforming with the required specs.

To enable development without needing access to the custom hardware that's under development I put together a simple app which chops up a video file and sends it in the format that the IoT devices will use, this massively simplified testing. Given the amount of TDD that was being done on the actual code the integration testing with the mock IoT device was pretty painless but it made it possible to test the server under load and also allowed for a simple demo system to be put together.

I'd forgotten quite how much I enjoy these 'perfect' development projects where the customer knows what they want and can communicate it clearly, the schedule is sensible, the challenge is taxing but not impossible and the price is right. Throw in the comfort of being able to 'waste' as much time as I like with TDD and the end result is a project which actually ends up being quite a calming and meditative experience.

And now back to the real world...

New IoT fixed-price development project

We're pleased to be working with one of our secretive security company clients again to provide a custom video streaming server for them. The server will enable them to record and stream live video from their network of "internet of things" connected devices.

As with all M2M and IoT projects this system needs to be able to scale to many thousands of concurrently connected devices and we're pleased to be able to use The Server Framework's new Streaming Media Option pack to be able to provide a complete solution quickly and cost effectively for our client.

That Slim Reader/Writer lock issue is a kernel bug

| 0 Comments
It looks like the Slim Reader/Writer issue that I wrote about back in September is a kernel bug after all.

Stefan Boberg has just tweeted to let me know that Microsoft has confirmed it's a bug and that a hot fix is in testing.

WSARecv, WSASend and thread safety

| 0 Comments
Last week I learnt something new, which is always good. Unfortunately it was that for over 15 years I'd been working under a misconception about how an API worked.

Tl;dr When using WSARecv() and WSASend() from multiple threads on a single socket you must ensure that only one thread calls into the API at a given time. Failure to do so can cause buffer corruption.

It all began with this question on StackOverflow and I dived in and gave my usual response. Unfortunately my usual response was wrong and after spending some time talking to the person who had asked the question and running their test code I realised that I've been mistaken about this for a long time.

My view has always been that if you issue multiple WSARecv() calls on a single socket then you need to be aware that the completions may be handled out of sequence by the threads that you have servicing your I/O completion port. This is purely due to thread scheduling and the actual calls to WSARecv() are thread safe in themselves. I wrote about this here back in 2002. My belief was that you could safely call WSARecv() from multiple threads on the same socket at the same time and the only problem would be resequencing the reads once you processed them. Unfortunately this is incorrect as the example code attached to the question shows.

The example code is somewhat contrived for a TCP socket in that it doesn't care about the sequencing of the read completions and it doesn't care about processing the TCP stream out of sequence. It issues multiple WSARecv() calls from multiple threads and the data being sent is simply a series of bytes where the next byte is the value of the current byte plus one and where we wrap back to zero after a 'max value' is reached.

Such a stream with a max value of 7 would look like this: 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x00, 0x01, 0x02, 0x03...

Validating such a TCP stream is as simple as taking any read that completes in any order and simply ensuring that the bytes that are contained in the buffer that has been returned follow the expected pattern. Starting from the first byte, whatever value it is, subsequent bytes must be one greater until the 'max value' is reached at which point the wrap to zero and continue to increment. Assuming my long held beliefs were true then it didn't matter how many threads were issuing WSARecv() calls on the socket at the same time, the resulting slices of the TCP stream should all be valid.

Unfortunately the test program fails to prove this and instead proves that without synchronisation around the WSARecv() call the returned stream slices can be corrupt. That is the data in the buffers returned from the read completion can include bytes that do not follow the expected pattern.

Of course the way that the test program uses the API is of limited use as I can't think of a TCP stream where it would be useful to be able to process the stream in randomly sized chunks with no need to have processed the data that comes before the current data. One of the reasons that I believed that I understood the requirements of the API was that I never used it this way. In systems where multiple reads on TCP streams were allowed I would always increment a sequence number, put the sequence number in the read buffer's meta data and then issue the WSARecv() all as an atomic step with locking around it. This made it possible to correctly sequence the read completions and also had the side effect of preventing multiple threads calling WSARecv() on a single socket at the same time. However, with UDP sockets the broken usage pattern is much more likely and I think that I may have seen the corruption on a client's system in the past - we're testing with fixed code now.

I've yet to fully investigate the WSASend() side of the problem but I'm assuming that the issue is the same. The person who asked the question on StackOverflow has seen data corruption when the receive side is protected by synchronisation and the send side isn't and I've no reason to doubt his analysis. I would like to think that calling WSASend() on one thread and WSARecv(), on another on the same socket, at the same time, is OK, but for now I'm assuming it's not and simply using a single lock per socket around both calls.

The current documentation for WSARecv() has this to say about thread safety: If you are using I/O completion ports, be aware that the order of calls made to WSARecv is also the order in which the buffers are populated. WSARecv should not be called on the same socket simultaneously from different threads, because it can result in an unpredictable buffer order. Which, IMHO, doesn't actually address this issue. It mentions the unpredictable buffer order, which is expected, but not the buffer corruption. Also the documentation for WSASend() has an identical note with "WSARecv" replaced by "WSASend", which, when you think about it, doesn't actually make sense at all. I don't remember seeing these notes when I was first looking at the documentation, but who knows (yes, I'd like a fully preserved set of all revisions to the MSDN docs so that I can diff back to what the docs said when I wrote the code ;) ). Network Programming for Microsoft Windows, Second Edition, doesn't mention any thread safety issues at all in its coverage of WSASend() and WSARecv() as far as I can see from a brief scan.

Apart from the one known case of UDP datagram corruption that may be caused by my misunderstanding I think most of our systems are pretty safe just by their design. However, there will be a fix included in the 6.6.3 release and 7.0 wont be susceptible to this problem at all due to its Activatable Object usage at the socket level.

It's always good to learn something new, more so when it challenges long held beliefs and especially when it proves those beliefs to have been incorrect.
I've been a big fan of Gimpel Lint for years. It's a great static analysis tool for C++ and it can locate all kinds of issues or potential issues in the code. My problem with it has always been that it's a bit of a pig to configure and run, more so if you're used to working inside an IDE all the time. Several years back I had some custom Visual Studio menu items that I'd crufted up that ran Gimpel Lint on a file or a project and some more cruft that converted the output to something clickable in the IDE's output pane. This worked reasonably well but was complicated to maintain and easy to forget about. There was nothing to encourage you to run Gimpel Lint regularly and if you don't do that then the code starts to decay.

Visual Lint, from RiverBlade is, at heart, simply a Visual Studio plugin that integrates other static analysis tools into the Visual Studio IDE. However, the simplicity of the idea belies the value that it provides. Visual Lint makes Gimpel Lint a usable tool within the Visual Studio IDE. For me it has taken a very complex and hardly ever used tool which I always meant to use but never did and turned it into something that I use regularly and that is easy to run.

Gimpel Lint is still complicated to configure and it can be depressing when you first run it on a large codebase but once integrated with Visual Lint and run regularly it becomes a usable and powerful addition to your toolbox.

Now that I have Visual Lint running in my IDE and most of my code abiding by my in-house Gimpel Lint rules I suppose I should look at RiverBlade's LintProject tool and get my build servers to report on Lint breaks...

About this Blog

I usually write about C++ development on Windows platforms, but I often ramble on about other less technical stuff...

This page contains recent content. Look in the archives to find all content.

I have other blogs...

Subscribe to feed The Server Framework - high performance server development
Subscribe to feed Lock Explorer - deadlock detection and multi-threaded performance tools
Subscribe to feed l'Hexapod - embedded electronics and robotics
Subscribe to feed MegèveSki - skiing