Why does windows hold the loader lock whilst calling DllMain?

I’ve been splunking around Dll loading recently for a pet project. It’s been an interesting journey and this evening I solved the final piece of the puzzle and, when I did, I suddenly wondered, not for the first time, why Windows holds the loader lock when calling DllMain()

Chris Brumme explains this much better than me, but; the loader lock is a system wide lock that’s held by the OS when it does stuff to its internal process tables; things like loading and unloading Dlls. You need to be careful what you do in your DllMain() because if you’re not careful you can deadlock on the loader lock and, well, that’s really bad… This has recently caused problems in managed C++ .Net code.

This post is...

(a) very old

(b) wrong in places

Read the comments.

For a while I’ve been of the opinion that you can see the loader lock in action when Windows starts up and you have masses of apps all starting up at once and all of them load a mass of Dlls and everything takes an age to load even though this is the fastest box you could buy; or when a screaming multi-proc box with masses of memory crashes to a halt for a second or so and the whole world locks up for no apparent reason whilst explorer goes la-la for a moment - but perhaps there are other reasons for these things…

Anyway, I’ve often wondered why the OS calls DllMain() from within the loader lock when, to me at least, it seems like it would be far better to call it after releasing the lock. The only reason I can think of is that if DllMain() fails then the loader needs to unload the Dll again and to do that it needs to frig with the process tables again and to do that it needs to hold the loader lock again… If that’s the case, surely you can release the lock, run DllMain() and then act on the result. If you need to unload the Dll due to initialisation failure then you just acquire the loader lock again… I expect I’m missing something obvious but…

The reason I’m thinking about all this is that I’ve been playing with intercepting functions in Dlls. There’s plenty of info on the web and in my book collection for this kind of thing and I got 80% done in no time. As usual the last 20% took 80% more time… There I was, happily hooking various API functions from my Dll and ‘doing stuff’ when I decided that I should intercept some more functionality. I hooked the new API calls and was surprised to see that my hooks were ignored. It took a while to work out what was going on. The calls to the API were happening before I hooked the Dll. They were happening before LoadLibrary() returned because they were part of static object initialisation within the Dll. I called LoadLibrary() it loaded the Dll, pulled in all the dependant Dlls, did all the fixup magic and then ran DllMain(). Since the Dll in question used the standard C runtime the Dll entry point that was called was actually _DllMainCRTStartup@12() which deals with starting up the C runtime, part of which includes calling constructors for file scope static objects that live in the same file as your DllMain() and then calling DllMain(). Since I, and all the examples that I’ve seen, hook Dlls after LoadLibrary() returns I was unable to hook the API in question before the Dll called into it.

Looking at the docs for LoadLibraryEx() I hoped that I could just use DONT_RESOLVE_DLL_REFERENCES to prevent DllMain() being called. Unfortunately, DONT_RESOLVE_DLL_REFERENCES does exactly what it says on the tin, it doesn’t resolve any Dll references and it doesn’t call DllMain(); unfortunately this made it useless to me as I needed the Dll references resolved before I could hook them…

I searched quite hard for a fix to this problem, thinking that it was an obvious thing that people would want to do. I didn’t find any solutions so I started to delve into how Dlls were loaded. This research ended up with the Microsoft Portable Executable and Common Object File Format Specification which is the document on PE files (Dlls, exes, etc). The code that I had for hooking already read information out of the Dlls PE file to locate import address tables and the like, I hoped there was something else in there that would help me hook DllMain(). I didn’t find what I was looking for, but I found something much more useful.

Early on in my investigation of the PE file format I came across an interesting sounding field in one of the file header structures; AddressOfEntryPoint this is the address that’s used to start the image in the PE file; it points to whatever calls main() for exes and whatever calls DllMain() for Dlls. Once I found out about this I decided that I could change this value and make it point somewhere else so that DllMain() wasn’t called during LoadLibrary() but when I wanted it to be called. I started dusting off my inline assembler skills (didn’t take long, they’re not very big) and thinking about how I could move the entry point address so that it pointed to the end of the real DllMain() and all kinds of other cunning plans. Then I read the document and saw that the field could be set to 0 for Dlls that didn’t have entry points; like resource only Dlls, I imagine.

So, my current solution to hooking Dlls so that I get my hooks in place before DllMain() runs is this. First I load the Dll image from disk and read the AddressOfEntryPoint field from the header, I resolve the RVA into a real address and write a 0 back into the image. Then I call LoadLibrary() which loads the Dll and all dependant Dlls, does all the fix ups that are required and doesn’t execute DllMain(). Then I run my hook code and finally I use a few bits of __asm magic to set the stack just right and call into DllMain(). This all works fine and achieves just what I want to do but now I’m nervous. If this works then why doesn’t the OS do it like this in the first place? Why doesn’t Windows release the loader lock before calling DllMain()?