Auction Server Performance

I’ve just about finished the auction server performance tuning. Our thrash test that uses 200 concurrent clients all responding to every bid with a counter bid has gone from averaging 40 incoming bids per second and 3700 outgoing bids per second to 180 incoming and 18000 outgoing. The peak incoming and outgoing were nearer to 1600 and 52000… I’m pretty pleased with the improvements and eventually decided to put the thoughts of lock free list traversal on hold, we don’t need it.

There were three main changes that we made to the code. The first was the change to how we deal with buffers that need to be broadcast to multiple clients. This removes a large amount of memory allocation and buffer copying.

The next change was to switch from using synchronous file writes for our log file to using async writes and IO completion ports. This works well and there’s still scope for improving the performance more by switching on FILE_FLAG_NO_BUFFERING and allowing our buffers to be written directly to disk without any additional memory copying occurring. I thought about switching to fixed size log records to help us achieve this, but, right now, the performance is more than good enough.

The final change was to eliminate contention for the lock around the collection of connected clients. Originally, in our proof of concept release, we allowed the IO threads to interact with the connection collection directly. Unfortunately, since one of the interactions was traversing the list during the broadcast and we were using a very simple locking technique we had a lot of contention for the collection. This was where we started to think about using multiple reader single writer locks and, perhaps, going lock free… The IO threads were never going to be responsible for blocking business logic in the production release so we slipped in a second thread pool and moved the lock contention off to another group of threads. This helped a little, it meant that the IO could happen without blocking but it still left us with contention, and the second thread pool introduced further context switching…

Lock free, or wait free, collections are interesting, but we have code to deliver and I wasn’t about to go off into research mode; so we shelved that idea and looked at the problem again.

Since all of the work that’s done in the business layer needs to lock the connection collection for most of the time we decided to reduce the number of threads in that pool to 1. The contention went away, and not surprisingly, so did much of the context switching. The IO threads post complete messages to an IO completion port that feeds the business logic thread and it is the only thread that needs to deal with the connection collection. We instrumented the IOCP “queue” between the IO pool and the business logic thread, so we could see how well the business logic thread kept up - it does just fine.

The final tweak was with how we time stamped the log messages. Since we were getting many messages per second and the granularity of the log timestamps didn’t really need to be finer than per second we decided to stop time stamping from the log messages themselves and, instead, write a timestamp to the log every second. This reduced the number of times we needed to generate the string representation of the timestamp and improved the speed that the business logic thread could process messages. A quick tweaks to reduce the noise in the log for a server that was idle (we now only produce a timestamp if there’s been a log message written to the log since we last produced a timestamp). A further tweak to force a timestamp to be produced before we write a new log message if we’ve been skipping timestamps due to inactivity and everything worked as expected.

There’s scope for improving things more, but not before we ship. The performance is well within the client’s requirements and the other improvements (switching off buffering on the async file writes, and reducing the number of ‘house keeping’ threads) are just nice to have, academic, issues at this point.