As I mentioned a while ago, it seems there's a strangely fatal bug in the Windows networking stack at present. This manifests as massive non-paged pool memory usage if a process creates a UDP socket, binds it to an address and fails to read from it faster than other people are writing to it. The issue appears to be present on all current Windows operating systems but is NOT present on Windows Server 2012 R2 if recent patches have NOT been applied but IS present as soon as the box is patched up to date... My test box was patched up until March 2020 and ran fine, as soon as it was patched to June 2020 it started to behave badly. Rolling back the updates fixed the problem.

The issue appears to affect inbound AND outbound UDP data flow (which makes it harder to run the DDOS clients from a fully patched Windows box as that can run out of non-paged pool when sending massive amounts of datagrams using overlapped I/O). It's possibly a firewall issue, probably an NDIS intermediate driver that is buffering when it shouldn't.

In summary, if you create a UDP socket and bind it to an address then someone can kill your box just by sending lots of datagrams at it... You might not notice this straight away as 'lots' is related to how fast you process the datagrams. A simple program that creates a socket, binds it to an address and then never reads from it will demonstrate the problem. All you have to do is watch the "\Memory\Pool Nonpaged Bytes" performance counter until your box dies. You can see this in action if you write a program that sends datagrams to the listening program in a tight loop. A real network connection doesn't appear to be required.

I'm more than happy for someone to prove me an idiot by showing me that this problem is my fault and not a networking stack or driver bug.

Strangely fatal UDP issue on Windows...


One of my clients runs game servers on the cloud. They have an AWFUL lot of them and have run them for a long time. Every so often they have problems with DDOS attacks on their servers. They have upstream DDOS protection from their hosting providers but these take a while to recognise the attacks and so there's usually a period when the servers are vulnerable. Recently we've seen a DDOS on UDP that caused them problems. Initially we thought that it was a recent code release that had made the servers less stable; it was, but only in the sense that some tracing code has made the UDP recv path fractionally less efficient and this had exposed a problem that had been in the server all along. After spending a while in triage with the updated code we narrowed it down to two small code changes and removed them. Since the removal of these changes made no sense in terms of the problem we were seeing I suggested adding a small, harmless, delay in the udp recv path. Immediately the problem was back.

This morning I managed to reduce the problem to some code that didn't include their server at all. The simplest synchronous UDP receiver and a simple UDP traffic generator running on a machine using either a real IP or localhost can demonstrate the problem. If the receiver isn't running and the load generator is pushing datagrams into a black hole then there's no problem. If the receiver is running fast enough then no problem. If the receiver has a delay in the receive loop then non-paged pool memory starts to grow at a rather fantastic rate.

More worrying is that if you bind a UDP socket to a port and then do NOT issue a recv call and a sender sends a stream of datagrams to that port then this causes non-paged pool usage to grow. Once you start to issue recv calls the non-paged pool usage will go down as you chew through the queued datagrams and as long as you can read the datagrams faster than the sender can send them then there's no problem... I really should run a test to destruction with this, but there's no sign of it slowing down and I really don't want to make my development box blue screen this morning.

I expect that this is some "new-fangled nonsense" in the Windows network stack. Probably a service that's doing "something useful" and doesn't throw datagrams away if its source is providing them faster than its sink can consume them. This seems to be a common problem. There seems to be a real reluctance for people to throw network data away if the consumer can't keep up. I've seen this before with network drivers that consume massive amounts of non-paged pool when failing to process outbound offloaded CRC calculations fast enough and when network drivers have 'flow control' options enabled which means they try and slow down data flow when the peer can't keep up with them. At some point a decision needs to be made to throw data away but nobody ever seems to want to be the one to make that decision. Instead non-page pool becomes exhausted and boxes blue screen. At least with these NIC related issues you have a chance to exercise some control in your sending code, you can generally spot that the NIC is slowing down, even if you're using asynchronous sends and you can decide to reduce the amount you're sending. I'm pretty sure that, for once, it's not network drivers this time. The problem is clearly demonstrated using localhost with no network needing to be present. However, network driver issues are easy to work around, usually being send related where we can make code changes or a component that is easy to swap out for another brand...

So, for now, we have a cunning plan to fix the DDOS, a potential support call to Microsoft and some long term grunt work disabling various network related services to see if we can work out where the problem is. Hopefully it's something obvious that I'm missing or just me being an idiot.

Previously on "Practical Testing"... The last time I updated the code was in 2016, things have changed quite a lot since then with several new compilers and several compilers that I no longer support. There are very few actual code changes, but the code now builds with Visual Studio 2019 (16.6 preview 3.0) and Visual Studio 2017. I've removed support for anything before Visual Studio 2015.

The code is here on GitHub and new rules apply.

Latest release of The Server Framework: 6.9.4

Version 6.9.4 of The Server Framework was released today.

This release includes changes to support Visual Studio 2019 (16.4), some new functionality and a bug fix to our OpenSSL ALPN handling code.

Happy New Year!

2020. Wow.

Things are going from strength to strength here at JetByte. As ever we have lots of games companies using The Server Framework and they tend to push us more than our finance clients ever did. Our secretive Online Gaming Company now has more than 400 million players per month on their cloud hosted server system and we're still developing the native C++ side of this for them. It's matured into a stable and flexible system and they just keep on pushing it in new directions. As part of this we've been working on the Linux and MacOS ports of The Server Framework and these are now working well and provide our clients with more flexibility. We're also doing more work for Eonic Games though they have taken over most of the development work on the server that we originally built for them. They have new things coming out soon and we're helping them get there!

Our Industrial Control Client is still keeping us busy, spelunking around in their legacy code and making things better and more flexible. They're still very hush, hush, so if we told you any more we'd have to kill you...

The large American postal company that will remain nameless is still running its extended pilot phase. We found a bug (so far just the one). We fixed a bug. We all decided to add a test mode to the system so that they could flush out protocol and connectivity issues with their mail sorting hardware providers. Lots of interesting work here.

This year will bring new releases of The Server Framework, including options for Linux and MacOS. We're always interested in getting involved in new projects, so do get in touch if you have something that you think we could help with. In the meantime, we'll be here, doing what we love to do; writing high performance server code in C++!

Latest release of The Server Framework: 6.9.3

Version 6.9.3 of The Server Framework was released today.

This release includes changes to support Visual Studio 2019 (16.3), lots of code changes to migrate towards "modern C++" idioms, issues raised by Resharper++ and changes in include path separators and file name case to support compilation on Linux. We have also removed some code that was previously deprecated and dropped support for Visual Studio 2013.

There are no bug fixes or intentional functionality changes to this release but a LOT of files have been touched; we decided to put this release out so that functionality and bug fix changes can be more easily seen going forwards.

I don't do roadmaps, but...

I'm in the process of putting together a series of releases for The Server Framework. It's a little more complex than usual so I thought I'd explain why that is.
For the past year we've been working on a Linux/Mac version of The Server Framework with several clients. This has involved adjusting a lot of the code and moving some stuff around; for example there was code in our "Win32Tools" library that isn't Win32 or even Windows specific and so that now lives in a new tools library, "CoreTools", that contains code that can build on all platforms. Switching to building code with multiple compilers tends to expose some new warnings and errors and building on Unix platforms requires that we change all of the path separators in the include statements (thankfully Windows supports 'both' path separators). We've had a "Linux spike" version of the code for some time and now we're pulling these changes into the mainstream releases in preparation for releasing the UNIX versions of the code to a wider audience.

In addition to the UNIX changes we have a major design change for the "SocketTools" library. This is stuff that has been in progress since around 2015 and makes it easier to unit test the code and makes it possible to support other operating systems. The main thrust of these changes is that "internals" of the networking code are now easy to replace, either with mocks for testing or with different implementations for cross platform support; so we can support IOCP, EPoll, KQueue, etc. The "new sockets" code lets 90% of your server stay the same no matter which platform it's running on and 'simply' switches out the code that does the networking. There's an optional "compatibility" layer that helps with migration but, in general, the best approach is to change to using the new callback interfaces and the whole "new sockets" design. The code massively simplifies things, especially in terms of "pluggable filters" for things like TLS and flow control (in summary, pluggable filters are no more as we were the only people to ever develop them, they adversely affected performance and they were complex to write, TLS, TCP flow control, etc are all now "included" in the code using the "internals" concept).

So, to that end, we have the following releases in the pipeline:

  • 6.9.3 - This is a release of the 6.9.2 code with the textual changes for include paths and cross platform compilation and changes to move the code towards "modern C++". Since we're now reaching the point where some of the older Microsoft compilers are out of support we can drop support for them and start fully utilising some slightly more modern C++ features where appropriate. There are no bug fixes in this release but there's a LOT of change.
  • 6.9.4 - This is a new features and bug fix release. Right now we're still accumulating issues for this release, it currently contains: a design change that makes dealing with read completion errors slightly easier for some clients; some debug output when an async file log becomes synchronous due to the amount of log lines queued (this seriously affects performance and often doesn't get noticed, it will now); some optional debug for helping with TLS handshake issues.
  • 7.0 - This is a release that takes 6.9.4 and moves the code around so that we could build cross platform, note that it doesn't actually include any code that lets you build cross platform. Like 6.9.3 there's a lot of change to the files but, hopefully, no functionality changes. Things like JetByteTools::Win32::_tstring are now JetByteTools::Core::_tstring for example and include paths have changed.
  • 7.1 - This is intended to be a "normal" 7.x release with new features and bug fixes.
  • 7.2 - This will include support for our "new sockets" code.
  • 8.x - It's intended that the UNIX support will be released in release 8.0

The 6.9.3 release is due soon. The 6.9.4, 7.0 and 7.1 releases are expected to be released in quick succession, if not simultaneously. Upgrading your code to 7.0 will take some work. We have a simple 'find and replace' program that can be run on 6.9.x code and that does "most of the work", but it's not perfect so you should allocate more time than normal for dealing with a framework upgrade and, ideally, let us know in advance when you intend to do the work so that we can ensure we have space in our schedule to help you. Upgrading from 7.1 to 7.2 will be painless unless you decide to "opt in" and switch to "new sockets". You only NEED to switch if you intend to move to cross platform support in 8.0, though "old sockets" will eventually be deprecated.

Moving forward there will be a Server Framework Option Pack for UNIX. I expect this will include both the Linux and MacOS changes and I expect that it will be priced the same as the majority of the other Option Packs. If you'd like to get involved with the UNIX beta code then get in touch.

The 6.9.x and 7.0/7.1 releases are currently moving through our build and test system, 7.2 and 8.0 are further off.
Version 6.9.2 of The Server Framework was released today.

This release includes changes to support Visual Studio 2017 (15.9), Visual Studio 2019 (16.2), design changes to the PerfMon tools library to improve performance and some bug fixes.
The Linux port of The Server Framework is going really well and we now have investigated both libuv and epoll back ends. There's still a lot of work to do before this will be something that we're releasing generally but the client's that are working with us on this are really excited by how well it's going.

The massively modernised, and far in the future 7.0 release of The Server Framework will include the Linux changes and our 6.9.2 maintenance release is due for release in Q2; it's been a while since we last had a release but there have been no bugs reported and so there's not really much to release as most of the effort is going into the stuff that will eventually become the 7.0 release.

Our work with the large American postal company that will remain nameless is about to go into an extended pilot phase and we're working on last minute adjustments so that the new system plays nicely with the existing system that it will eventually be replacing.

The work with our Industrial Control Client in Germany is going really well. Phase 1 is complete and we've replaced the networking layer in one of their key pieces of software and refactored away the accumulated cruft from 30 years of maintenance. We have a shiny new testing system that we've written to help us compare the old message flow to the new message flow to ensure that we haven't changed anything that we shouldn't. This intercepts the network traffic and deblocks the messages for given test scenarios and then compares to the previous version of the code. Great for hassle free regression testing.

In summary, we're very busy doing what we love to do!

Busy, busy, busy...

We're going to be really busy for the rest of the year as we've just won a large contract with our Industrial Control Client in Germany. We'll be working on the systems that we've worked on for them before, adding new functionality and integrating The Server Framework into some applications that we haven't worked on before.

The Linux port of The Server Framework is going really well and we now have a server and client system running on Linux using our custom reliable UDP network layer. This integrates with the .Net Core integration we've been doing for the same client. The Linux work will definitely make its way into the main framework at some point; do get in touch if you're interested in this.

About this Blog

I usually write about C++ development on Windows platforms, but I often ramble on about other less technical stuff...

This page contains recent content. Look in the archives to find all content.

I have other blogs...

Subscribe to feed The Server Framework - high performance server development
Subscribe to feed Lock Explorer - deadlock detection and multi-threaded performance tools
Subscribe to feed l'Hexapod - embedded electronics and robotics