MiniDumpWriteDump now mostly useless for in process use

I’ve been using the MiniDumpWriteDump() API from DbgHelp.dll for 20 years or so. It has proven to be a useful diagnostic tool, and I use it in all manner of places, including many where others may simply use an assert(). It’s a heavy-weight debugging tool, but it has proved useful over the years; rather than just throwing an exception because things that shouldn’t happen have happened, I often also generate a dump file so that I can get far more data than you could ever log or report in another way.

The API call allows you to generate a dump for any process that you have the correct access rights for, but I have, traditionally, used it for in-process dump generation. The documentation has warned for a while that the call can deadlock on the loader lock if used to generate a dump in this way, but I haven’t seen that in practice. However, recently, I’ve been seeing more and more cases of deadlocks in the dump generation process that are, it seems, due to MiniDumpWriteDump() using the heap to allocate memory after it has begun its work and, significantly, after it has halted all other threads in the process. It seems that, with the current MiniDumpWriteDump() code, if any thread in the process is inside the heap when the dump is triggered, then the process will deadlock on a lock inside the heap. The thread in the heap has been suspended and will never release the lock and the thread calling MiniDumpWriteDump() then blocks as soon as it tries to access the heap. For me, this manifests as a hung process with a zero byte dump file.

The problem appears to have got worse in recent years. Of course, the documentation isn’t versioned, and so it’s difficult to know when the warnings against in-process use began, but I’m pretty sure that it was not originally an issue. Certainly the amazingly detailed documentation from 2005 that I remember using when I was first writing my dump generation code doesn’t mention the issue. It may be that this has always been a potential problem, and it’s just a race condition that is more likely to occur on modern hardware, but it feels to me as if the API wasn’t always this fragile. I’m pretty sure that the mini dump generation code used to use _alloca() so perhaps the issue is that the code has been changed to use _malloca() since this is viewed as being more secure. The main issue with this change is that, with _alloca() a failure to allocate space on the stack results in a SEH stack overflow exception, with _malloca() such a failure results in an attempt to allocate using the heap… However, this is purely speculation and doesn’t get me anywhere…

Anyway, no point complaining about things you have no control over. The solution to this issue is to do what the current documentation suggests and always generate the dump of a process from a different process. This isn’t especially difficult to do, you need the process id of the process that you want to generate a dump for and the correct access rights. The simplest approach might be to spawn ProcDump from sysinternals and have that do the work for you, but I expect I’ll craft my own external dump process so that I can have a bit more control. Ideally, the act of generating a dump will do as little as possible to the state of the process that is generating the dump, so I have a design in mind that simply requires the process that wishes to generate a dump of itself triggering an event and the external process doing the work before setting a second event that the triggering process is now waiting on.

We’ll see.