C++ Tips: 4 - Learn to work in terms of abstractions, no matter how small

2006-01-06

In the fight to make C++ code easier to reason about and understand, never underestimate the value of a name. Giving something a decent name is the first step in thinking about the concept at a slightly more abstract level. Abstraction is all about selective forgetfulness, by grouping together several related program elements, defining a concept and giving that concept a name you can, from then on, choose to work at the level of the name rather than the detail. This has a marvellous effect on the amount of information that you can hold in your head at the same time and is a very powerful aid to communication, both between programmers and between the code and the programmer.

Probably the simplest form of abstraction available in C++ is the typedef, it’s just another name for something but I’m often amazed at how some programmers fail to capitalise on the power of good naming. I often see code like this:

std::list<std::pair<std::string,std::string> > tokens;
std::pair<std::string,std::string> token;

rather than the more preferable:

typedef std::pair<std::string, std::string> Token;
typedef std::list<Token> Tokens;
  
Tokens tokens;
Token token;

Even in the small code snippet above the use of a tiny abstraction, a name, makes the code easier to reason about. A Token is a pair of strings and Tokens is a list of Token data. The tiny abstraction allows us to be explicit about the relationship between the std::pair<std::string, std::string> that’s used in the declaration of tokens and the corresponding use in the declaration of token. It’s the same usage, the same abstraction; we call it a “Token”. In real code, you’d no doubt manipulate this data a little more. Without the simple abstraction of the typedef the code required to iterate through that list of tokens would look something like this:

for(std::list<std::pair<std::string, std::string> >::const_iterator it = tokens.begin() ...

What should communicate quite clearly that you want to iterate over a series of tokens is, instead, full of implementation details that force you to fully understand, and reason about, what constitutes a token. The simplest abstraction, a name, moves the detail into “need to know” territory and reduces the amount of reasoning that you must do when working with the concept. Now, as I’ve said before, typedefs aren’t perfect, but as a step towards abstraction they’re valuable.

I have a rule of thumb that has served me quite well. As soon as you need to start worrying about the C++ parse problem that requires a space between the closing angle brackets of a template that is a template on a template (std::list<foo<bar> >) then you’re missing a name for the inner template…

The token in the example above might be better off being defined as a specific kind of structure as it would allow us to convey more meaning and work at the level of the abstraction rather than at the level of the implementation. Using the token shown above we’d end up with code that operated in terms of the first and second elements of the std::pair template. Code like this:

DoThing(token.first);
  
DoSomethingElse(token.second);

The code above shows the standard names of the two elements of the pair. Whilst pair is useful it’s often worth the time and effort to declare your own structure that’s more suitable for your particular abstraction, even if it only has two elements. Once again the power of a name should not be underestimated. For the time taken to declare something like this:

struct Token 
{
   std::string name;
   std::string value;
};

You end up with code that reads better, communicates with the programmer and stays at the level of the abstraction rather than jarringly displaying implementation details. Although it’s said that good programmers are lazy, failure to apply a little abstraction where it’s required is the wrong kind of laziness. Amazingly, I’ve seen code that has built up some quite complex structures using nested pairs. Not surprisingly, the level of communication between code and programmer from a line that reads: DoStuff(first.second->first.first->second); is fairly low… The problem is that it’s often not quite so obvious that the communication provided by a single pair is often equally as low.

As we’ve seen, naming data and groups of data can help build simple abstractions. Another way to create an abstraction is to move small pieces of code into simple functions. In my opinion, even if a function is only called from one place it’s often worth having simply for the fact that the detail is hidden behind a name. Once again by naming something you can work at the level of the name, the concept, rather than at the level of the detail; you can drill down when you need to.

Working in terms of abstractions usually means thinking in terms of the problem that you’re solving rather than in terms of the way that you’re solving it. When you’re doing this you’ll often find that you don’t tend to use ‘raw’ types that often. The abstraction that is a std::map lives in the realm of the solution whereas a “WidgetCollection” lives in the realm of the problem. What’s more, and I’ve said this before, the abstraction of the std::map, or, indeed, most ‘standard’ abstractions, is too general to provide maximum value when working in the problem domain. A more precise, more focused abstraction allows you to work at a higher level. Just as “vehicle” is a valuable concept, the concept of a “bus” is more precise, and the concept of a greyhound bus or a routemaster even more so. If you are forced to communicate in terms of “vehicle” the whole time when referring to a Routemaster then you need to continually remind your audience that you should only enter through the opening at the rear and that it can’t fly and can’t travel on water… The more specialised the concept that you’re working with the more precise your understanding of the operations that you can perform on it and the clearer it communicates its purpose.

Of course, as Joel Spolsky once pointed out in “The Law of Leaky Abstractions”, abstractions tend to be “leaky” and you’ll often find that you need to be able to work in terms of their component parts to be able to program effectively, but, and it’s a very bit but, that doesn’t remove the value of the abstraction and small, simple, abstractions can often be as valuable and powerful as larger ones.

In my opinion, being able to work at different levels of abstraction is one of the most important skills of a good programmer. Being able to build these abstractions from simpler abstractions is as important as is being able to drill down through them. In fact, I think it’s more important. Often, programmers who can drill deeply through abstractions seem to focus on this at the expense of building their own abstractions. Whilst I agree that being able to cut through the names and concepts, breaking them apart into their component parts and drilling further and further until you end up at the assembler level is amazingly useful when you have a nasty bug to chase, or when an API doesn’t function quite how you, or the documentation, expects. It’s not always necessary to go all the way down to the “metal”, it’s just often useful if you can. Joel alludes to this in his “The Perils of Java Schools” piece; the more layers that you understand the more you can reason about your abstractions on multiple levels at the same time, switching effortlessly between the use of the concept to focus on the detail of the concepts that make up the abstraction that you’re working with and those below it. However, if you focus on this at the expense of learning how to build your own abstractions then you’re missing out and your code will only communicate at a very detailed and complicated level.

Never underestimate the power of a good name. Simple abstractions are as important as the big and complex abstractions. Once you start building simple abstractions you’ll find that they begin to build upon themselves. Your code will begin to communicate at multiple levels at once and you will be able to decide which level of detail is appropriate for each situation.