One of the good things about the server performance testing that I've been doing recently is that I have been able to witness some rather rare failure conditions in action and seen how the server framework handles them.
When using IO Completion Ports for asynchronous IO you get notifications when reads and write complete. Read completions are obviously an important event, it's where you receive incoming data. Write completions, on the other hand, are often viewed as less important. All that a write completion tells you is that the TCP/IP stack has taken ownership of your data. It doesn't guarantee that the data has been sent or that it has been received by the other party, you need to rely on your own protocol to transmit that kind of information. The current framework design provides virtual functions to deal with read and write completions. All servers need to provide an implementation for ReadCompleted() but, for the most part, most servers can pretty much ignore WriteCompleted(). The default implementation of WriteCompleted() used to look like this:
void CStreamSocketConnectionManagerCallbacks::WriteCompleted(
IStreamSocket *pSocket,
IBuffer *pBuffer)
{
// Derived class overrides this to deal with write completions
// The check below has never failed in production code.
if (pBuffer->GetUsed() != pBuffer->GetWSABUF()->len)
{
OnError(pSocket,
_T("CStreamSocketConnectionManager::WriteCompleted")
_T(" - Socket write where not all data was written - expected: ") +
ToString(pBuffer->GetWSABUF()->len) +
_T(" sent:") + ToString(pBuffer->GetUsed()));
}
}
// The check below has never failed in production code
// but can do so if you run out of non-paged pool at the right point...
// If you only have a single write pending you can resync and resend based
// on what you actually managed to send...
The other failure that I hadn't seen before was due to exceeding the "locked pages limit". The design of the server framework means that all IO operations are performed on the IO pool threads. That is the IO threads issue the actual read and write calls as well as handle the completions. I do this because it simplifies the use of the framework (as I mentioned in the original article, by doing this we avoid having to worry about thread termination affecting outstanding IO requests). The downside of this design is that there's a slight performance hit due to the extra trip through the IOCP (although we optimise this away if we know we're already on an IO thread) and the fact that if the actual calls to WSASend() or WSARecv() fail then we have to report that failure in a slightly convoluted manner. Up until this recent testing exercise these calls haven't failed. The actual code looks something like this:
if (SOCKET_ERROR == ::WSASend(
pSocket->GetSocket(),
pBuffer->GetWSABUF(),
1,
&dwSendNumBytes,
dwFlags,
static_cast(pBuffer),
NULL))
{
DWORD lastError = ::WSAGetLastError();
if (ERROR_IO_PENDING != lastError)
{
pSocket->OnConnectionError(WriteError, pBuffer, lastError);
pSocket->WriteCompleted(); // this pending write will never complete...
pSocket->Release();
pBuffer->Release();
}
}
OnConnectionError() which gets routed back through to the derived server or connection manager object where you can handle the failure. At present both read errors and write errors are routed to the same handler with an enum used to differentiate them. I don't currently pass the dwSendNumBytes parameter through as I assume that a failure is always total (that's now on my list of things to check). Again what you can do to recover depends on your server design and the protocol you're using. One thing to be aware of is that if a read fails in this way and your design means that you only have a single read pending on each socket at any one time then you now do not have a read pending on this socket and, as is quite usual in servers that I design, if your connection is being held open purely by the fact that the reference count on the socket is held above zero by the pending read then your connection will close.
Len - some interesting articles/information here! Thanks for that.
I'm wondering, when you hit the NP pool limits in your testing, was that visible in an accurate fashion in TaskManager (i.e. if you use TaskManager -> View, Select Columns, NP Pool, and watch the NP Pool for the server process?). In other words, was the NP Pool usage for the process from Taskmanager hitting 256 MB or atleast a hundred megs?
(Note - I'm not using your code here - just using your articles to learn, since I'm building my own IOCP framework for personal use in apps.)
Not sure if you'll read this comment to a 4 month old post, but then you did end up replying to tons of comments on your earlier IOCP code posts, so its worth a try :)
The numbers were fairly accurate when I hit the NP limits but I found that I tended to hit the locked pages limits more often and there's no numbers that you can use to detect that.
Posted by: Len at February 18, 2006 09:53 AMRight, and WSAENOBUFS is the error common to both, so you can't tell which is which.
Im now beginning to think I'm running into locked page limit errors.
Interesting stuff :)
Posted by: blue at February 18, 2006 08:59 PMIndeed.
You probably are and there's no way to tell the difference and there's no real way to determine if you're about to hit the locked page limit because it's system wide and not process wide and I can't think of a reasonable way to work out how the system stands in regard of locked pages...
My solution was to allow the server to be able to limit the number of connections programatically and then, if you own the box where the server runs, you can configure it to be safe. I then allowed each 'server' (ie listening port) within a process to share this limit so that you can control multi host and or multi port servers... Seems to work but it's all a bit theoretical as my clients tend not to run into these limits on their production boxes anyway.
Posted by: Len at February 18, 2006 09:16 PMWondering how best to minimize one's chances of hitting the locked page limit - is there anything else one can do apart from setting WSARecv to do 0 byte receives (and then fetching the data via non overlapped WSARecvs when the WSARecv completion notification arrives) ?
As far as debugging goes though, one can try to test specifically for NP Pool limits though - if operations fail with WSAENOBUFS, spawn a thread that just calls WSASocket 500 times or so...that should cause allocation of 1 MB of NPP, if all those calls do succeed, then you probably have a locked page issue.
Posted by: blue at February 19, 2006 11:33 AM