Spolsky's law stikes again...

2007-09-19

I’m finalising the testing of the x64 version of The Server Framework and, since it’s a fairly major release, I figured that it waranted an email to existing customers to see who wanted to have the update shipped to them straight away (although all existing customer are entitled to all upgrades to The Server Framework free of charge I usually rely on them monitoring this RSS feed and asking for new releases rather than just sending them out; generally only critical bug fixes are worthy of an email notification…). Anyway, there’s lots of interest, which is good, I’m always happier when a new release of code is out with lots of people as it shakes out any remaining issues quickly…

One client has recently been making some changes to the design of their SSL server and had come across a problem with the framework, and, since I’d just contacted him, he fired me back an email asking about his problem… It’s one of those cases where Joel’s law of leaky abstractions rears its ugly head. The client had decided that one of the example servers would server as a good base for his server design and switched out the base server for the SSL enabled one to add SSL support. One of the big selling points of our SSL code is that it’s ‘plug compatible’ with the non-SSL code, you can add SSL without needing do do anything to the business logic you’ve already developed for a non-SSL enabled server; at least that’s the idea…

The server in question works with variable length messages and accumulates the inbound data in a single data buffer. If a read completes with some data but there’s not a complete message available then the code simply issues a new read with the partially filled data buffer that it was given in the read completion and another read is scheduled to put more data into the buffer. Eventually there’s a complete message in there and it can be processed. This design is a win from a thread contention point of view as no memory needs to be allocated and the buffer pool doesn’t need to be touched when processing incomplete messages. This design works nicely in the non-SSL server but once the server is SSL enabled it stops working.

The problem is that the SSL server is abstracting away the fact that your data is now only part of the data that is flowing over the TCP connection. The SSL encryption and handshaking results in TCP data flow that has nothing to do with your application level data. Because of this the server takes control of managing outstanding reads and pretty much ignores your requests to read data. It’s a cunning abstraction which works nicely, except when it doesn’t. You can write the same business layer code and it will work with the non-SSL server (which honours your read requests) and the SSL server which effectively says “I know what I’m doing, I have a read pending already, I’ll just quietly throw your read request in the bin”. This is fine if you’re letting the server allocate read buffers as and when it needs them but fails if you’re expecting the server to put the next lot of data that arrives into the buffer that you have given it…

I was faced with two options, either have the SSL layer complain loudly if it was given an existing buffer to read into or fix the layer so that things just worked in the way that you’d expect it to if it wasn’t there… Luckilly it was relatively easy to do the later…

The SSL connector now collects any read buffers that it’s given and keeps them around for when it needs a buffer to put user data into and pass up to your business layer. If it hasn’t got a buffer from you then it goes back to how it used to work and allocates one of its own. The buffers that you supply aren’t actually used for the physical network reads, they’re used for the logical data reads from the SSL connector. The leaky abstraction is patched, at least for now…