Just enough RFC822

Second on my list of email filters was a filter that splits a ‘domain mailbox’ into several different mailboxes depending on the username that the email is addressed to. This is basically just an intelligent version of the mailbox writing filter. The problem was, it needed to understand RFC822 addressing…

I have several POP3 mailboxes that work at a domain level. For example I can recieve pretty.much.anything@lenholgate.com. The guys that manage the domain supply me with a single POP3 mailbox and I’d like to split it locally using a mail filter.

The filter will work in a similar way to the mailbox writing filter except that it will have a map of email adresses to mailboxes. When it filters a mail message it will work out who the message is for and deliver it to the appropriate mailbox. A message might be for several mailboxes or there may be no mailboxes that accept the message.

I can write tests for the new filter and get as far as having it process a simple email address do@dah.com and then I realise that parsing a string of email addresses from a mail header and getting a list of the actual addresses in mailbox@domain format is non trivial; after all, this is a valid address: "\"Muhammed Ali\" <Muhammed.(I am the greatest,) Ali @(the)Vegas.WBA> and the mailbox is Muhammed.Ali and the domain is Vegas.WBA. A quick flick through RFC822 leaves my head spinning at the thought of building a parser for the augmented BNF grammar that’s presented there.

I waste an hour or so scribbling simple state machines on my scratch pad and searching the web for someone else’s solution (why is it always in perl?).

Then I write a simple test for the CRFC822AddressParser class. "do@dah.com, dah@dah.com" and I build the state machine from there. Moving from simple to more complex addresses I use the BNF grammar to guide my tests and then adjust the state machine to parse the test case. At each stage the tests for the simpler cases support the new changes. In less time than I wasted fretting over the complexity of the problem and scouring the web I manage to implement the code. A little refactoring and we’re done. Step one is to split all of these addresses into a list of individual addresses. Step two is to take the individual addresses and extract the mailbox and domain portions. I use some of the more fiendish test addresses from the perl implementation that I found during my web search and they all pass :)

Once that’s done I can construct more convoluted tests for the domain mailbox filter…