Dealing with the simplest things

2003-11-06

The POP3 client is now complete. It can download messages from POP3 servers and store them in a message store. I’ve implemented a file system based message store that is compatible with the POP3 server code’s file system message store. We can download messages from server’s and make them available via our server.

As expected the test first approach had driven the design in a simple and decoupled direction; eventually some of the simple decisions were inappropriate so I added some tests and refactored…

I knew that using a std::string to represent a mail message was a broken idea when I first wrote the code. It would be slow to build and manipulate the messages and we’d be limited in size to the number of characters that fit in a string. But it was the simplest thing that would make the test pass so I went with it. Eventually I had most of the functionality working nicely. I could download messages and serve them up again from my own server; I, or someone else, could have started work on the next phase and had enough code working to provide messages to test with. But I wanted to test the client with some large messages so I sent a 5mb zip file to the test account and watched as the client crawled painfully through processing the message and eventually died.

So I wrote a test. The test pushed a 1mb message through the client. The test failed so I started to refactor the problem away. The problem stemmed from a design in which a message was pulled into the client in one lump; to be able to do that we needed to read the message from the source in chunks and assemble the message in memory before returning the whole message to the client. This was bad for many reasons; firstly it meant that we had to be able to hold the whole message in memory for the client to be able to work with it, and secondly we were doing that by appending message chunks to a string representation which simply “realloc’d” each time it filled up; much memory management and byte copying was occurring. Since the client just issues commands to a server, reads the stuff that the server sends it and pushes that stuff into a mailbox there didn’t seem to be much need for it to actually play around with the whole message. It would be equally happy to pull a chunk of message from the server and push that chunk into the mailbox.

It seemed that it wasn’t so much that I was wrong to represent the message as a string it was more a case that I’d designed the interfaces that the client was using inappropriately and therefore encouraged me to use a string where I shouldn’t. So I set about to redesign the interfaces involved. The RETR function was the problem. It used the ReadLine function on the IInputStream interface to pull lines of message from the input stream, processed them and added the lines together before returning a string representation of the message to the caller. The caller then pushed the message into the AddMessage function of the IWriteableMailbox interface. The filesystem mailbox that I was using at the time then wrote the message to disk. All in all pretty broken, but it passed all of the tests but the new one…

It seemed fairly obvious to me that message manipulation should work in chunks. So we’d need an interface that took some bytes and a count to say how many there were. Easy enough; change the mailbox to break up the creation, population and completion of a message into seperate function calls. You call the create function once, it gives you a token that represents the message you’re creating and you then pass that to the AddData function along with each chunk of data. When you’re done you complete or abort the message, again by passing back the token. Seemed like a good idea. I wrote some tests and adjusted the mailbox interface. Lots of things broke, the mailbox interface was used by test code as well as production code; I have mock mailboxes and null mailboxes and whatever, so there was more code to change than I expected. I fixed all of that up and adjusted the calling site to take the string that it was being given and do the create, add, commit dance that was now required. Run the tests, all was good.

Next I stepped back a level, the code that used the mailbox now had to manage a cookie and make sure they called either commit or abort when they were done; sounds like I needed a smart wrapper for that cookie. One class and a few tests later I had such a beast; now the new mailbox could be used safely in the presence of exceptions. We simply create out smart message which obtains a cookie by calling CreateMessage it then exposes the AddData function that allows the user to do what they want and the Commit function that allows the user to say “yes, this message is good”. If you don’t commit then the smart message wrapper calls Abort in it’s destructor. Coolio.

So the client function, RETR, is still a problem because it returns a message as a whole rather than pushing chunks into the new mailbox interface. It would be nice not to have to tie the POP3 client to my concept of mailboxes too tightly as I might want to use it for something that doesn’t “write the message into a mailbox”, so the IMessageDataSink interface is born. It has a single method; AddData but without the cookie that the mailbox needs, in fact, it’s the same interface that the smart message exposes; that’s ‘andy ‘arry…

So now our call to RETR takes a message data sink, it can push the message, in chunks, into the sink. Adjust the tests, move the string to chunking interface shim back up a level and watch the tests pass again.

Now we just need to let the input stream know it can work in chunks rather than lines and we’re sorted. I adjust IInputStream so that it has a Read method that takes an IMessageDataSink interface. Read simply keeps reading at whatever chunk size it likes until AddData returns false. Since the sink has no control over the size of the chunks in use AddData now needs to be able to let its users know how many bytes of the chunk it has consumed. The code is adjusted, the tests break, mock objects are re-mocked and the tests pass.

The 1mb message test passes. There are now no std::string representations of message fragments in the entire client message retrieval code path. Along the way I needed to adjust the POP3 client’s message content processor; it needs to scan the message as it comes in because it needs to remove byte stuffing and the like…

It took around 20mins to come up with the basic design change that was required and make the actual changes to the ‘production’ code. It probably took around 1.5 hrs in total to do this in small steps and adjust the tests as I went. I flip-flopped between being annoyed at the ‘simple design that was so obviously broken’ and causing this ‘rework’ and the joy of having the tests pass and feeling safe…

Once the client could pass the new test and download my 5mb test message I needed to adjust the corresponding message path in the server code. Interestingly the tangled testing that I was using made this possible without adding new tests. I admit, I never added the test that failed; I simply refactored the server code path to use the same interfaces as the client code path.

Once that was done I could retrieve and serve my test message, and what’s more the zip file worked once it had passed through my mangle. After that I refactored the code that had been touched but not made better by the previous changes. The tests made me bold, I refactored in big steps because I knew that I could just revert to the last working copy from CVS and that my tests would tell me if I slipped up. It was fun. It was fast.

I ran a coverage tool on my test harness. I have 87% coverage. I’m surprised and impressed. This is coverage of all code compiled into the test harness; there are functions in support libraries that I never call, most of the code in the actual production code of the library I’m working on gets hit. I quite like this testing lark.