The design implications of FILE_SKIP_COMPLETION_PORT_ON_SUCCESS and the Vista overlapped I/O change, reprise

Back in April I was talking about how the fact that 6.2 allowed you to enable FILE_SKIP_COMPLETION_PORT_ON_SUCCESS meant that some server designs might start to experience recursion that they previously didn’t experience. During the testing for the imminent release of 6.3 I managed to hit on just the right combination of build machine, load and test server to actually produce this under test. As I mentioned back in April, this is only a problem (in 6.2) for servers that do all of their work on their I/O threads (an unusual design amongst my clients). Anyway, with a test client that just pumps new data at the server at a rate that’s not related to the rate the server is processing the data and a server that does all of its work on its I/O threads you can suffer from stack overflows due to the recursive completion calls.

To prevent this I’ve added a recursion monitor to the I/O operation issuing code and if a configurable number of recursive completions occur we can break the recursion by pushing the next I/O request through the IOCP even if it wouldn’t normally need to be marshalled. I also now have unit tests that can reproduce this situation on demand, something that I should really have put together for 6.2.

In 6.3, with the new options which remove the I/O operation marshalling that I was talking about back in April this recursion limiter is even more valuable as the number of server designs that could be affected by the previously unbounded recursion is greatly increased.

The recursion limiter is currently configured at compile time and, by default, allows 10 recursive calls before breaking the cycle.

This change will be present in 6.3 which will be released shortly.

Updated: 04/04/2013 - In 6.6 we do this properly and queue the operations so that they are processed in a manner which does not result in recursion rather than allow limited recursion.