Filtering mail

2003-11-20

Now that I can retrieve and serve it up again via a POP3 I want to do stuff to it in between retrieving it and serving it.

The idea was to have a series of filters that get passed each message, Do Stuff ™ and either allow the message to be passed on to the next filter or end the filtering process.

Most of that works now, here’s how I got there.

The mail client pulls mail from servers and deposits it in a mailbox in a file based message store; each mailbox is a directory and each email is a file. The filter process should look at these incoming mailboxes every so often and take all mail in the mailbox and push it through the filter. When the mail gets to the end of the filter then it’s deleted. The filter process manages a filter chain for each mailbox. A filter chain consists of 0 or more filters. A filter chain with 0 filters simply deletes all mail in a mailbox…

Filters can write mail messages to other mailboxes; we could have a filter that duplicates a mailbox (making it available in two outgoing mailboxes) or a filter that splits a domain account “*@jetbyte.com” into several named mailboxes; 1 per user? A filter could remove or process all MIME attachments, or provide blacklist, whitelist or spam filtering. Filters can be added to the chain in any order and can determine if the message will be processed by any subsequent filters or simply deleted after processing.

Since a filter chain with no filters deletes messages, the simplest filter to implement and have a system that could actually work would be one that wrote a message to another mailbox. With a filter that did this we could create a filter chain containing just this filter and effectively move messages from an input mailbox to an output mailbox.

Sounds pretty complex. I wrote a test, well, actually I wrote an interface first and then I wrote a test…

We know we’re writing multiple types of filter, so we’ll start with the interface; IFilter. A filter needs to process N messages for a mailbox so we’ll need a function something like this virtual bool FilterMessage(const IMessage &message). I don’t know what IMessage will look like, but I know I’ll need it (bad test driven developer, bad!)… Since the filter may need to write the message to other mailboxes we’ll initialise it with an instance of <code>IWriteableMessageStore. Since the filter may need mailbox specific configuration data before it can operate we’ll also initialise it with an instance of IManageFilterData; I’ve no idea what IManageFilterData does just yet, I just know I’ll need it. The resulting interface looks like this:

class IMessageFilter 
{
   public :

      virtual void StartFilteringMailbox(
         IWriteableMessageStore &messageStore,
         IManageFilterData &filterDataManager) = 0;

      virtual bool FilterMessage(
         const IMessage &message) = 0;

      virtual void StopFilteringMailbox() = 0;

      virtual ~IMessageFilter() {}
};

With the interface in place we can start writing the test. A “message writer filter” takes the supplied message and writes it to a destination mailbox and allows other filters to process the message. I mock up a message and a filter data manager. I plug it all together and call the functions in the correct order, start, filter, stop. I write code to check that the contents of the mock message have ended up in the mailbox that I expected. I compile and it fails because I haven’t written the filter object yet. I write it and it compiles and fails because I haven’t fleshed out the IMessage interface yet… I guess I’m back in test first land…

Filters process messages. They may or may not need to have the whole message in memory at once. Right now the general design for the client and server code allows manipulation of messages in chunks, we’ll stick with that approach here and define the IMessage interface like this:

class IMessage
{
   public :

      virtual void WriteTo(
         IMessageDataSink &sink) const = 0;

      virtual ~IMessage() {}
};

When a filter wants to access message data it calls WriteTo passes it a sink and the message is pushed through the filter until the filter says stop…

The message writing filter’s FilterMessage implementation is then as simple as this:

bool CMessageWriter::FilterMessage(
   const IMessage &message)
{
   CWriteableMessage output(*m_pMailbox);

   message.WriteTo(output);

   output.Commit();

   return true;
}

The test now passes. Write another test for a different destination mailbox and the test fails; we have some hard coded stuff in the filter’s start function. Time to work on the filter data manager and configuration data…