Practical Testing: 32 - Intrusive containers.

| 0 Comments

Back in 2004, I wrote a series of articles called "Practical Testing" where I took a piece of complicated multi-threaded code and wrote tests for it. I then rebuild the code from scratch in a test driven development style to show how writing your tests before your code changes how you design your code. Since the original articles there have been several bug fixes and redesigns all of which have been supported by the original unit tests and many of which have led to the development of more tests.

In 2010 I needed to improve the performance of the code for a specific use case. The resulting changes saw the creation of a Timer Wheel version of the code which has the same interface and uses the same tests. I also ruminated on contention, invented a crazy notation to help me talk about contention and experimented with custom STL allocators and private heaps in an attempt to reduce it. In the end I decided that the best approach was probably to use custom containers rather than the STL. The custom containers would be intrusive (or invasive as I called it at the time) and would avoid the need for memory allocation on insertion and removal of items.

I didn't have a pressing need to improve the performance further and so the intrusive container idea slipped down my list of things to do. Recently I've had a need for an intrusive map for another part of The Server Framework and have implemented some custom containers, set, map, multi-map, based on an intrusive red-black tree that I've written. It then seemed appropriate to integrate them into the timer queue and see what difference they make. That's what I'm doing in this instalment. As usual the existing tests support these, quite major, internal changes to the code.

Intrusive containers

Both the timer queue and the timer wheel use STL containers. The wheel just uses a std::set to store currently active timer handles. The timer queue uses a set for the same reason and also includes a custom multi-map implementation which is built with a std::map and a std::deque. The reason for the custom multi-map is that it's useful in the code to be able to remove all entries for a given key in one go and this seemed to be the best way to achieve that.

My new intrusive containers are all based on a custom red-black tree which does no memory allocation during the insertion or removal of items. STL containers are simple to use, in part, because they create the necessary book-keeping data dynamically and then store your data and key inside of it, this leads to memory allocation and release during insertion and removal. The alternative is to add the book-keeping data to the data item that you wish to store. This is then created once, when the data item is created, rather than each time the data item is inserted into a collection. One of the downsides of this approach is that you need one set of book-keeping data per collection that you intend to store your data in. If you want to store the data in two maps at the same time then you need two map nodes in the data item...

An associative container such as a map is easily built on top of a balanced search tree. The tree simply needs to store the data to be stored along with the key in the nodes that make up the tree. A set is simply a map in which the key and the data are one and the same. Since I have specific design requirements for my multi-map which the STL doesn't support I'll start with showing how I replaced the std::set instances with my new intrusive set.

One of my complaints about the boost intrusive containers was that they weren't "drop in" replacements for the standard STL equivalents. I've managed to get pretty close to a drop in replacement with my intrusive set, at least in the simplest usage scenario. This means that in the Timer Wheel code I can simply replace this:

      typedef std::set<TimerData *> Handles;

      Handles m_handles;

With this:

      typedef TIntrusiveSet<TimerData> ActiveHandles;

      ActiveHandles m_activeHandles;

I decided on the name change as it more accurately represents what the collection is used for.

The replacement set supports all of the operations that I was using from std::set and uses the same names. If I didn't insist on matching my usual coding style then the change above would be all that would be required for use of the new set. As it is, you also have to change begin() to Begin() and erase() to Erase(), etc.

Changes to stored data

Of course, due to the intrusive nature of the container we have to change the data that is stored in the container to contain the book-keeping data that the container needs. In this case that's easily done by simply having TimerData inherit from CIntrusiveSetNode. However, if you're going to store the object in multiple containers then you need to have multiple sets of book-keeping data. This complicates the design of the container itself somewhat as you can't simply require that the data inherits from the book-keeping data or whatever. The solution to this is to include the ability to templatise the container on a function object that can be used to obtain the book-keeping data from the data to be stored.

The complete template definition of the intrusive set actually looks like this:

template <
   class T,
   class K = ULONG_PTR,
   class TtoK = TIntrusiveSetNodeKeyAccessorKeyIsAddress<T>,
   class Pr = std::less<K>,
   class TtoN = TIntrusiveRedBlackTreeNodeIsBaseClass<T> >
class TIntrusiveSet : public TIntrusiveRedBlackTree<T, K, TtoK, Pr, TtoN>
{
   public :

      typedef TtoK key_accessor;

      typedef TIntrusiveRedBlackTree<T, K, TtoK, Pr, TtoN> Base;

      Iterator Find(
         const T *pData) const;
};

The key point for this discussion being the TtoN template parameter which defaults to the following (and yes,the node for the set is in fact just a typedef of the node for the red black tree...).

template <class T>
class TIntrusiveRedBlackTreeNodeIsBaseClass
{
   public :

      static CIntrusiveRedBlackTreeNode * GetNodeFromT(
         const T *pT)
      {
         CIntrusiveRedBlackTreeNode *pNode = const_cast<CIntrusiveRedBlackTreeNode *>(static_cast<const CIntrusiveRedBlackTreeNode * >(pT));

         return pNode;
      }

      static T *GetTFromNode(
         const CIntrusiveRedBlackTreeNode *pNode)
      {
         return const_cast<T*>(static_cast<const T*>(pNode));
      }
};

This provides two conversion functions which allow the collection to convert from a T to a node and back again. The code above works when the node is a base class of T but it's easy to replace this default functionality if you decide to store your node as a member variable, for example:

template <class T, CIntrusiveRedBlackTreeNode T::*memberPointer>
class TIntrusiveRedBlackTreeNodeIsEmbeddedMember
{
   public :

      static CIntrusiveRedBlackTreeNode * GetNodeFromT(
         const T *pT)
      {
         CIntrusiveRedBlackTreeNode *pNode = 0;

         if (pT)
         {
            pNode = &(const_cast<T*>(pT)->*memberPointer);
         }

         return pNode;
      }

      static T *GetTFromNode(
         const CIntrusiveRedBlackTreeNode *pNode)
      {
         T *pT = 0;

         if (pT)
         {
            CIntrusiveRedBlackTreeNode *pNodeOffset = &(reinterpret_cast<T*>(0)->*memberPointer);

            ULONG_PTR offset = reinterpret_cast<ULONG_PTR>(pNodeOffset);

            pT = const_cast<T *>(reinterpret_cast<const T *>(reinterpret_cast<const char *>(pNode) - offset));
         }

         return pT;
      }
};

T can make these classes friends if you want to use a private base for the node or have it as a private member. I'm sure all of this can be improved upon, but right now it works and does what I need. Note that you can easily write your own adapter classes which can work in any way you want as long as they can do the conversions required. I realise that I could be more flexible by having multiple separate function objects all of which expose an operator() as you could then use normal functions as well as function objects but that just seems unnecessary to me.

As you can see from the intrusive set's template definition there's another function object that can be supplied. This one TtoK takes care of obtaining the key from the T. Once again this can do whatever you want, but for the set it simply returns the address of the object being placed into the set.

Default implementations for TtoN and TtoK are easy to provide for the intrusive set. TtoK tends to be a little more T specific when we're dealing with a map or a tree.

Performance changes

Whilst the changes here are simple and remove the need for the timer wheel to allocate memory when timers created and destroyed they don't actually improve performance that much. Timer creation happens far less than setting, cancelling and handling timers and whilst the change reduces the contention of the operations from C(tq)+(C(ts-tq+1)) to C(tq) (see here for what this means) the actual improvement is likely to be pretty minor except in cases where the allocator used by the STL is under high levels of contention and the (C(ts-tq+1)) part of the original code really bites.

Changing the timer queue

Next time we'll look at replacing the STL containers in the timer queue. The map that is used in the timer queue is inserted into during timer setting and erased from during timer cancellation and timer handling. Due to the custom 'multi-map' design that's currently used this requires std::map and std::deque inserts. The replacement multi-map is specifically designed for this usage pattern and so we will avoid all memory allocation during timer setting, cancellation and handling. This is likely to have a much more obvious performance improvement.

Note that whilst there are some tests for the intrusive containers much of the testing is being done by the tests from classes that use the containers, this is less than ideal and will likely be rectified in the future. I've always been a proponent of Just In Time Testing and so the fact that the containers have decent test coverage by way of other classes' unit tests is enough for me right now.

Source code

The code is here and new rules apply. A fair while has passed since the previous episode in this series of articles. My build environment, and some of the support code has changed a fair bit since then. The code will build with Visual Studio 2008, 2010, 2012 and 2013. Win32Tools is the solution that you want and Win32ToolsTest is the project that you should set as active. The code will build with either the standard STL that comes with Visual Studio or with a version of STL Port. The code uses precompiled headers the right way so that you can build with precompiled headers for speed or build without them to ensure minimal code coupling. The various options are all controlled from the "Admin" project; edit Config.h and TargetWindowsVersion.h to change things... By default the project is set to build for Windows 7; this will mean that the code WILL NOT RUN on operating systems earlier than Windows Vista as it will try and use GetTickCount64() which isn't available. To fix this you need to edit the Admin\TargetWindowsVersion.h file and change the values used; see Admin\TargetWindowsVersion_WIN2K.h and Admin\TargetWindowsVersion_WINXP.h for details. Since I'm looking at the performance of the code I've adjusted the VS2008 solutions to build without checked iterators in release mode, by default the STL in VS2008 builds with checked iterators enabled in release mode as well as in debug mode and this adversely affects performance; see here for more details. This code is in the Public Domain.

Leave a comment