WSASend()from multiple threads on a single socket you must ensure that only one thread calls into the API at a given time. Failure to do so can cause buffer corruption.
WSARecv()calls on a single socket then you need to be aware that the completions may be handled out of sequence by the threads that you have servicing your I/O completion port. This is purely due to thread scheduling and the actual calls to
WSARecv()are thread safe in themselves. I wrote about this here back in 2002. My belief was that you could safely call
WSARecv()from multiple threads on the same socket at the same time and the only problem would be resequencing the reads once you processed them. Unfortunately this is incorrect as the example code attached to the question shows.
WSARecv()calls from multiple threads and the data being sent is simply a series of bytes where the next byte is the value of the current byte plus one and where we wrap back to zero after a 'max value' is reached.
WSARecv()calls on the socket at the same time, the resulting slices of the TCP stream should all be valid.
WSARecv()call the returned stream slices can be corrupt. That is the data in the buffers returned from the read completion can include bytes that do not follow the expected pattern.
WSARecv()all as an atomic step with locking around it. This made it possible to correctly sequence the read completions and also had the side effect of preventing multiple threads calling
WSARecv()on a single socket at the same time. However, with UDP sockets the broken usage pattern is much more likely and I think that I may have seen the corruption on a client's system in the past - we're testing with fixed code now.
WSASend()side of the problem but I'm assuming that the issue is the same. The person who asked the question on StackOverflow has seen data corruption when the receive side is protected by synchronisation and the send side isn't and I've no reason to doubt his analysis. I would like to think that calling
WSASend()on one thread and
WSARecv(), on another on the same socket, at the same time, is OK, but for now I'm assuming it's not and simply using a single lock per socket around both calls.
WSARecv()has this to say about thread safety: If you are using I/O completion ports, be aware that the order of calls made to WSARecv is also the order in which the buffers are populated. WSARecv should not be called on the same socket simultaneously from different threads, because it can result in an unpredictable buffer order. Which, IMHO, doesn't actually address this issue. It mentions the unpredictable buffer order, which is expected, but not the buffer corruption. Also the documentation for
WSASend()has an identical note with "WSARecv" replaced by "WSASend", which, when you think about it, doesn't actually make sense at all. I don't remember seeing these notes when I was first looking at the documentation, but who knows (yes, I'd like a fully preserved set of all revisions to the MSDN docs so that I can diff back to what the docs said when I wrote the code ;) ). Network Programming for Microsoft Windows, Second Edition, doesn't mention any thread safety issues at all in its coverage of
WSARecv()as far as I can see from a brief scan.
I've been working on some code for a client recently that needs to run in a multi-threaded environment. Unfortunately it was never really written with that requirement in mind, or, more likely, the people who wrote the code knew that it needed to be accessed from multiple threads but didn't really understand quite what that meant. As such I'm doing some fairly hairy refactoring for them. It's high risk as there are no unit tests and the budget doesn't really extend to completing the work, let alone "spending extra time" writing unit tests... The code is written in quite a 'C' style, most things are simple structs and most data is public and as such member functions fall where they are most conveniently written. Adding correct locking is complicated due to lots of mutable global data, lack of encapsulation and the fact that, at its heart, it's multi-user server software where one user gets to poke at all the others.
Step zero was to realise that no locking is bad but correct locking was a long way away so I slipped in a single "global lock" to make the system work. It's stable and not too slow but it doesn't really ever use more than one thread at a time as almost everything requires that the executing thread holds the "global lock".
Step one is to add some encapsulation and move things around so that we can one day dream of adding in more fine grained locking - unfortunately this step is a big one.
I'm finding it interesting comparing this work to the work I was doing immediately before. I've been working on the next big release of The Server Framework and this release will put an Activatable Object at the heart of each connection. This removes the need for locking from the socket object and lots of stuff that's built on top of it. It means that masses of complex code can be pulled out and thrown away. It's not a new idea, the socket object is now simply driven by a message queue. Locks are needed to put work items into the queue but during processing we can run without locking, safe in the knowledge that only one thread can ever be processing at a time.
The problem is that making the kind of change that I'm making right now is a massive change. It has led to a complete rewrite of the entire TCP side of The Server Framework. The general pattern is removing locking, because it's no longer needed and simplifying code because the ways in which it can be called are simpler. The unit tests that I had for the previous version of the code have helped to guide me; yes new tests have been written, but the old ones told me what I needed to test even if they needed to be changed a lot to work with the new code.
With The Server Framework work it was scary but the various layers of tests supported the work and, in general, the code was being made simpler and easier to understand. With the client work there are no tests, huge amounts of the application simply didn't compile until the changes were complete and, whilst some things are being simplified, others are being made more complex to accommodate changes to make the code thread safe. I expect that with enough iterations the client's code could shift to using Activatable objects and some of the locking would go away again but I'm not sure we'll have the luxury of those extra iterations.
I've been noticing a strange thing for a while on Windows 8/8.1 and the equivalent server versions. The issue occurs when I'm using a Slim Reader/Writer Lock (SRWL) exclusively in exclusive mode (as a replacement for critical sections). What happens is, when a thread that has just unlocked a SRWL exits cleanly, immediately after unlocking the lock, sometimes threads that are waiting on the lock do not get woken and none of them acquire the lock.
At first I spent ages thinking that this was some kind of subtle bug in my LockExplorer tool as initially the problem only manifested itself during test runs that were using LockExplorer to detect potential deadlocks. Recently, however, I've been seeing the identical problem in normal programs being run with no clever lock instrumentation going on.
I still think it's more likely my bug than the operating system's but I recently found this Knowledge Base article, #2582203, "A process that is being terminated stops responding in Windows 7 or in Windows Server 2008 R2" which says "This issue occurs because the main thread that terminates the process tries to reactivate another thread that was terminated when the thread released an SRW lock. This race condition causes the process to stop responding.". This sounds suspiciously like the problem that I'm seeing, though not exactly.
The problem I see is that other threads waiting on the SWRL don't get notified that the lock is now unlocked. This locks my program up during shutdown purely because I have code that waits for these threads to exit cleanly and they can't as they're waiting on an unlocked SRWL. It's not a deadlocked attempt at recursive SWRL acquisition as I have debug code in my wrapper class which detects such behaviour. It's not an orphaned locked SRWL as breaking the hung process into the debugger and immediately continuing it causes one of the threads that's blocking on the lock to immediately acquire it and continue.
I haven't applied the hotfix yet, partly because my systems are Win 8 rather than Win 7 and partly because I'm not yet convinced that this is the issue.
As a work around I've added a call to Sleep at the point where my thread class allows a thread to exit. This seems to reduce the instances of the problem, which leads me to believe it's a race condition of some kind.