C++ Tips: 1 - Avoid unnecessary optionality

| 7 Comments

One of my main aims when writing code in C++ is to have the code clearly communicate its purpose. I find it useful to be able to look at a single line in isolation and have a pretty good idea of what its effects are on the code that it cooperates with. Unfortunately C++ code can often be written in an imprecise way that makes reasoning about what it actually does harder than it needs to be. By increasing the precision of your code writing you can limit what the code could potentially do to just what you want it to do; by doing this it's then easy to reason about what the code is doing when you read it as it's clearly and precisely doing only one thing.

One of the most important, and simplest, things that you can do to reduce code complexity and make your code easier to reason about is to avoid unnecessary optionality.

At its simplest, avoiding unnecessary optionality is simply replacing the use of a pointer with that of a reference. A pointer can either point to something, or not, whereas a reference must always refer to something. If you see a pointer in a piece of code you often have to look around a bit to find out if the pointer is valid or if it could be null. If you see a reference you know straight away that it is valid. That's pretty much all there is to it. If the code you are writing uses an object that is optional then you might choose to use a pointer to represent that optionality, if the object is not optional and must always be present then you should use a reference.

I personally find that reasoning about references when examining code is less complex that reasoning about pointers. A reference can do less that a pointer, it's more precise because it can only refer to a valid object whereas a pointer can refer to a valid object or null. Each time you see a pointer you have to work out if it is likely to be valid from the context in which you see it and from where you obtained it from, when you see a reference you don't have to do that. Whilst reading, or working through, complex code that you're not that familiar with, pointers require that you remember more about a variable than references.

Of course, about now someone will pipe up with a comment that points out that you can subvert the C++ reference mechanism by deliberately creating a reference to a null pointer. That's true, but if you have programmers that are doing that then you have more fundamental problems that this tip might help you address.

As always, the devil is in the detail. What does it mean to remove unnecessary optionality in practice?

Assume we have a container that maps from string names to widgets. Widget objects are pretty large so we decide to store pointers to widgets in the container and have the container manage the lifetime of the widgets that it contains. This could be a simple stl::map but I tend to prefer a more precise interface so I would tend to write a wrapper class that exposes the interface that I actually need.

Suppose that the usage pattern for the collection of widgets is that they're loaded at program start up and the names are presented to the user so that they can select an appropriate widget to manipulate. What should the code that retrieves a widget from the collection look like?

Often you'll see code in the collection like this:

CWidget *CWidgets::Find(
   const std::string &name)
{
   CWidget *pWidget = 0;
   
   Widgets::iterator it = m_widgets.find(name);
   
   if (it != m_widgets.end())
   {
      pWidget = it->second;
   }
   
   return pWidget;
}

and code at the point of use like this:

CWidget *pWidget = m_widgets.Find(name);

assert(pWidget != 0);

DoThisWithWidget(pWidget);
DoThatWithWidget(pWidget);
// more code that manipulates the widget

One of the problems with this kind of code is that it doesn't clearly express what's going on. The code says the widgets collection may contain a widget with this name whereas the requirement is that the widgets collection always contains a widget with the supplied name. The collection is slightly more general purpose than it needs to be and because of this we inject a small amount of uncertainty into the code that uses it. Although the code following the assert may be protected from bugs that cause the collection not to contain the expected value the programmer still needs to reason in terms of a pointer, a potentially optional widget, rather than in terms of a reference to a widget that must exist. The slight addition in complexity is negligible at this point since you can easily see that the assert that states the pointer must always be valid, however, once we trace the code into DoThisWithWidget() or DoThatWithWidget(), or even as we move further away from the assertion code, things may, or may not, be more complex and harder to reason about.

My preferred solution to this accidental complexity is to remove the unnecessary optionality. Instead of using CWidgets::Find() as shown above we would use something like this:

CWidget &CWidgets::Get(
   const std::string &name)
{
   Widgets::iterator it = m_widgets.find(name);
   
   if (it == m_widgets.end())
   {
      throw CException("Widget: \"" + name + "\" not found");
   }
   
   return *(it->second);
}

This method on the widget collection will return a widget of the specified name if it exists and throw an exception if it doesn't. Obviously this is inappropriate if you need to find out if a widget exists in the collection, but it is appropriate if the widget must exist. What is done with the exception is up to the user of the widget collection, or, perhaps, the user of the user of the widget collection.

The code at point of use becomes something like this:

CWidget &widget = m_widgets.Get(name);

DoThisWithWidget(widget);
DoThatWithWidget(widget);
// more code that manipulates the widget

It's now quite clear that if the code ever returns from the call to Get() then you have a valid widget. Likewise, code following this point is clearly written with the intention of working with a widget that is always present. This allows our removal of optionality to "trickle down" to the functions that follow; DoThisWithWidget() and DoThatWithWidget() can now be written to take a reference and any assertions or checking for valid pointers inside them can be removed. Code such as this:

void DoThisWithWidget(
   CWidget *pWidget)
{
   assert(pWidget != 0);

   // do stuff with widget
or this:
void DoThisWithWidget(
   CWidget *pWidget)
{
   if (!pWidget)
   {
      // handle unexpected error, widget can't be null!
   }
   
   // do stuff with widget

need never exist. Instead we just have code that clearly communicates its expectations in a way that the compiler can confirm and that we, as programmers, can forget.

void DoThisWithWidget(
   CWidget &widget)
{
   // do stuff with widget

Often people will complain that they must use pointers rather than references for non-optional objects for some reason. Most of the time they're wrong ;) Sure, you may be passed a pointer that can never be null from an API that you're using but you don't need to pass this design pollution on to the rest of your code. Likewise you may use an API that requires a pointer but that doesn't mean you need to pass the object as a pointer within your own code.

Even when an object is optional you may still gain some clarity by removing the optionality. Of course if the object really is optional then you need to fabricate something to use when the optional object isn't present, for this you can often use the Null Object Pattern. This replaces an optional object which may be nothing with an object that will always be there but may do nothing. Using a Null Object often allows you to move switching based on the presence of the optional object and consolidate the "doing nothing" within the null object rather than spreading it across all the users of the object in question.

Code that clearly communicates what it is doing is easy to understand. Unnecessary optionality adds a small amount of confusion to code by having the code say that it could do one of two things when in fact it only actually does a single thing. Removing unnecessary optionality can make code communicate more clearly and can reduce uncertainty when reading unfamiliar code.

7 Comments

Hi!
What is your strategy if you want to make CWidget thread safe ? Whether you make internal map thread safe or use critical sections before m_widget ?

The problem you solved above using the wrapper class "CWidget" is something programmers frequently encounter. In the example above you have CWidget which is a repository of widget objects. I am interested to know about your strategy in case when you require multiple such repositories in a single program. I mean apart from widget repository you need Car repository and some more such repositories in a single program.
As such will you make a single base class or what you do ???

Regarding the thread safety issue of the container, it depends on what is required. If you need to be able to access the container from multiple threads at the same time then I'd first add a CriticalSection to the collection class and then lock and release it in each access method. You may then find that you need to adjust the access methods because you require slightly larger atomic operations (you may want an atomic "add or create" rather than two methods, one to see if the collection already contains a widget and one to add a new one), etc.

Ifran

It depends on how similar the collections need to be. The whole point of the CWidgets class is to provide an interface that is tailored (and reduced) to just what is required by the collection. If there were multiple collections all with the same requirements then I would probably eventually harvest them into a templated base class. However, given that you begin to lose the point of the custom class as soon as you start to make changes to facilitate requirements from one collection that are not requirements for one of the others (the idea is that the interface is narrow and specific and precise), you need to be careful and weigh the costs of apparant duplication with the costs of diluting the individual collection concepts.

Regarding the insistence of using pointers (instead of references) for composition: If your object can be reassigned or swapped, references are out, because they cannot be re-seated. In fact, I oftentimes find myself forcing a class API to use references (to forbid nulls), while internally using pointers to allow reseatability. I don't have to do "!= 0" checks because the interface guarantees it.

Also, I believe your assertion of simplicity is debatable. Essentially, you've replaced a null-pointer-check with a possible exception-throwing. Particularly to experienced C developers, that doesn't sound like a simplification, because the repeated habit of replacing error codes or return values with exceptions means that exception-safety is now a requirement throughout the code.

Mind you, I agree with it, since I use the STL (and therefore already think about exception safety), but many old-school C++ shops may go in the opposite direction - replacing exceptions with enumerations, return-by-output-argument, or pointers -- all in the name of simplicity, because beauty is in the eye of the beholder(s).

P.S. your comment filter doesn't like the word "mem" "ber" (obfuscated to get it past the filter). I guess it wouldn't be too happy about some of the rather comical STL-related error messages I've seen over time:

"Invalid use of mem ber in std::back_inserter()"

Tom, I agree re the need to use pointers if the object can be swapped, however, by protecting the pointer by using references in the API you're doing what I would suggest. There are places where you must use pointers but these places are far fewer than most people realise.

As for the simplicity debate, well, I wouldn't often choose to work in environments with C++ where you can't use exceptions and I don't tend to adjust my C++ designs to accomodate users of other languages so your points re exceptions not being that 'simple' are correct in your circumstances but don't apply to me ;) I spent years working in the nightmare that is code where 'everything returns an error code' and 'nobody bothers to check error codes' and I wouldn't want to go back to it.

I'll take a look at the spam filter, but personally I don't want dodgy stl errors in my comments ;)

Leave a comment