More thoughts on Big C


I'm finding that the thread contention notation that I made up the other day to help me talk about the performance implications of the changes I was making is pretty useful. The definition needs adjusting slightly though...

For a given lock the worst case contention is C. For an operation involving a single lock where t threads exist during the lifetime of the process and n threads access the lock the contention for the lock can be expressed as being C(n). This is simple enough and means that C(1) is no contention on a single lock as there's only one thread that can access it throughout the life of the process. C(0) implies no lock at all.

This gets a little more useful when you have multiple locks and you want to know the contention of a larger piece of code. Take this snippet, for example:

CCallbackTimerQueueBase::Handle CCallbackTimerQueueBase::CreateTimer()
   TimerData *pData = new TimerData();
   return reinterpret_cast<handle>(pData);
void CCallbackTimerQueueBase::MarkTimerUnset(
   TimerData *pData)
   m_handleMap[pData] = TimerLocation(m_queue.end(), 0);
A call to CreateTimer() has a contention factor of C(t-new)+C(t-stl)+C(t-new) and this representation clearly indicates that there are two locks involved and one is entered twice(the lock in new which is also used when the map calls its allocator, and the lock in this version of the STL for its debug iterator support). Since the locks are entered sequentially the contention factor can't be reduced in any way as all thread locks are locked and unlocked in sequence. And this is without the queue having any lock of its own. If we were to look at the thread-safe version then we have a contention factor of C(t-queue)+C(t-new)+C(t-stl)+C(t-new).
CThreadedCallbackTimerQueue::Handle CThreadedCallbackTimerQueue::CreateTimer()
   ICriticalSection::PotentialOwner lock(m_criticalSection);
   if (!lock.TryEnter())
   ICriticalSection::Owner lock(m_criticalSection);
   return m_spTimerQueue->CreateTimer();

However, since we hold the queue's lock for the entire period of the call to CreateTimer() it would be more accurate, I think, to represent the contention as C(t-queue)+(C(t-new)+C(t-stl)+C(t-new)), note the brackets. Also, once we have acquired the queue's lock we have t-queue - 1 fewer threads in the sets of threads that are t-new and t-stl, thus it should probably be represented as C(tq)+(C(tn-tq+1)+C(ts-tq+1)+C(tn-tq+1)). If we were to use a custom allocator and a custom operator new which both worked on heaps that were private to each instance of the queue then we'd end up with C(tq)+(C(1)+C(ts-tq+1)+C(1)) and we could remove the C(1)s as these represent a lock with no contention, giving us C(tq)+(C(ts-tq+1)) if debug iterator support is enabled and C(tq) if it isn't.

Obviously the hard part about all of this is actually seeing the locks, but hopefully that's where my deadlock detection tool can help out. It already knows about all the locks and all the threads that access them and the location of the lock acquisitions and releases so it could calculate C for a given function and provide drill down where necessary...


Looks like you have the start of a PhD thesis there.

:) I don't know about that... I do know that I'm finding that often multi-threaded performance issues come from the strangest and most unexpected places and these places usually involve contention. Often normal profilers have a hard job of showing this as the code that is the hot spot when under heavy load isn't a hot spot with fewer threads or less loading... Being able to visualise the contention of code and drill down through it to see exactly where your code could block other threads would be rather useful IMHO...

Leave a comment