Is Raymond Chen's use of Assert valid?

2005-09-28

I’m sure you’re all getting bored on my views on using Assert in modern C++ code by now, I’ll try and make this the last one. Last time, I asked if anyone had what they’d consider to be an example of the use of assert in C++ where the usage was valid given my objections to assert. Mark updated this recent posting on the subject to include a link to Raymond Chen’s blog where Raymond is discussing COM object destruction and uses an assert… Given that everyone knows that Raymond is a seriously good programmer doesn’t his use of assert validate the position of those who disagree with me.

No. But then I would say that, wouldn’t I…

The code in question is this:

ULONG MyObject::AddRef()
{
   assert(m_cRef != 0);
   return InterlockedIncrement(&m_cRef);
}

The assert is added with the comment “Therefore, at a minimum, you should assert in your AddRef() method that you aren’t incrementing the reference count from zero.”.

COM is a complicated environment. You generally can’t throw exceptions across COM boundaries because the COM runtime doesn’t deal with C++ exceptions. Lots of people use this as an excuse not to use C++ exceptions inside COM objects at all, but in reality you just need to make sure that you don’t pass an exception out of a COM interface. Most COM interfaces return an error code in the form of an HRESULT which tells the caller if everything was OK (by returning S_OK) or if there were problems (by returning anything else). There are lots of documented COM error codes and you can generate your own on a per COM object basis if you need to be more descriptive.

Unfortunately, AddRef() and Release() are different. Way back before DCOM was a twinkle in the eye, in the time when Inside OLE was the COM bible, it was decided that AddRef() and Release() would return the updated reference count. It probably seemed like a good idea at the time… Once proxies and DCOM came on the scene it was no longer possible for the COM runtime to guarantee that the return value from AddRef() or Release() was in any way related to the actual reference count of the object; proxys might optimise the passing on of AddRef() and Release() calls to the stub and so the object could never really know how many client references were held and neither could the proxys. The end result was that AddRef() and Release() are amongst the few COM interface methods that don’t return an HRESULT and the value that they do return is, generally, useless.

The history lesson is just so that we can discount the usual method of indicating failure from a COM object. We can’t simply return E_FAIL or some other failure code to tell the caller that there’s a problem. So what can we do?

As usual, it depends. From the snippet of code and commentary in Raymond’s posting we can’t tell exactly how this COM object is likely to be used. The reason that he’s “at least” adding an assert is that the object in question has a bug that means that the reference count of the object is being reduced to 0, at which point the object destroys itself and during the destruction process the object uses an interface on itself which causes AddRef() to be called and the subsequent call to Release() means that the object is destroyed twice… The assert is there to catch the bug rather than waiting for it to show up as memory heap corruption or invalid pointer use.

The reasons that I have stated for disliking the use of assert in C++ still apply to this piece of code:

The check will, likely, go away in a release build.
The assert is covering for poor design.
The assert makes it very hard to write a unit test for the failure condition.
If the object is running without a user interface then the assert’s dialog box may cause issues.
and finally: The assert is really just an “exploding comment”.

The situation is somewhat more constrained than “normal” C++ code. We can’t change the design, by having AddRef() return an HRESULT, perhaps, as we don’t own the interface. We’re on a code boundary between C++ and the COM runtime; if we throw an exception then we’re breaking some rules, the exception will cause problems if the object is remoted, and defending that position is harder than if we weren’t at a COM border. The actual problem is happening from within a C++ destructor; this is another boundary where it’s not wise to allow a C++ exception to cross. So, what do we do?

Once again, it depends. In this particular situation I’d probably just fix the bug in the destructor; the simplest fix is for the destructor to increase the reference count to one before doing anything that may result in AddRef() or Release() being called; in the absence of other reference counting bugs the last external reference being released will decrement the reference count to 0 and initiate destruction, once inside the destructor any additional calls to Release() will only ever bring the count back down to 1 and so avoid double destruction. This will prevent the double destruction problem that the code currently exhibits. I’ll ignore the question of whether it’s wise to be doing all of that work in the destructor.

Of course that doesn’t change the fact that AddRef() and Release() have some unchecked preconditions. Shouldn’t we use assert for them?

Not if I can help it, no. Often, when you’re using ATL for example, you don’t own the AddRef() and Release() code (and ATL uses asserts anyway!) and, to be honest, I’ve not often been in the position where COM reference counting bugs were such a problem that I’ve needed to deal with these preconditions; usually you can simply deal with the obvious issues in the code that uses the COM object… But, if I did own the code, and it was a problem, or I felt I needed to prevent it ever being a problem, then I would try not to have to use the traditional assert for two of the reasons stated above; debug only checking and difficulty in regression testing.

First I’d investigate exceptions. If the object in question is only ever going to be used in process and never across apartment (and we can should these things impossible if that were the case) and the problem is something that we’re having a lot of regressions on and therefore we need a reliable test for it in future then I’d use an exception, and have the tests ensure that the object never throws. If the object can legitimately be remoted then things are harder. Given that the standard assert either dumps you into a debugger or terminates the program there and then I don’t really see that that’s any less severe than breaking the COM rules, but we’re on a slippery slope. We could do some investigation work; if we were to validate the preconditions with a test that throws an exception then the COM runtime would likely deal with it in the same way that it would if we were to touch memory that we shouldn’t have touched. The stub is likely wrapping its call to our object in a Windows structured exception handling block and would return an RPC_xxx error to the proxy (remote interface AddRef() and Release() are dealt with by the proxy and stub using IRemUnknown and the RemAddRef() and RemRelease() methods do return HRESULT error codes). Of course I’d investigate a little more and test this theory before relying on it. If it works then out of process objects could then be tested because failure would mean the failure would be reported by the COM runtime. In process objects could be tested because a test of an in process COM object, in C++, can test to ensure that the object doesn’t break the rules of COM by throwing exceptions across interface boundaries. If this worked then the code could stay in both production and development builds… There are a lot of assumptions there and we’re breaking rules, I’m not convinced that the result is worth it, but, as always, it depends.

If this didn’t work, or if I was unable to justify the design to my peers, and both are equally likely, then we could investigate a non standard form of assert. This would need to remain in production builds and report its failure in such a way that it could be monitored from a test harness thereby making assertion failure use a programatic error reporting channel rather than the human channel that it usually uses. If the problem wasn’t considered important enough to require the extended development but it was serious enough for us to want to prevent it then I’d probably suggest an more standard assertion that remained in the production build. If we profiled the system and found that wasn’t acceptable from a performance point of view then I’d argue my point and hope that someone else could come up with an acceptable solution. If they didn’t, I expect I’d use a standard assert. The key point is that it should always be the last resort, and as such you should always try and solve the real problems first.

Thanks Mark, it was an interesting journey.

[Updated 3rd October]

See the following follow up posting:

Asserts are evil, except when you have no other choice