Why does windows hold the loader lock whilst calling DllMain?

| 6 Comments
I've been splunking around Dll loading recently for a pet project. It's been an interesting journey and this evening I solved the final piece of the puzzle and, when I did, I suddenly wondered, not for the first time, why Windows holds the loader lock when calling DllMain()...

Chris Brumme explains this much better than me, but; the loader lock is a system wide lock that's held by the OS when it does stuff to its internal process tables; things like loading and unloading Dlls. You need to be careful what you do in your DllMain() because if you're not careful you can deadlock on the loader lock and, well, that's really bad... This has recently caused problems in managed C++ .Net code.

For a while I've been of the opinion that you can see the loader lock in action when Windows starts up and you have masses of apps all starting up at once and all of them load a mass of Dlls and everything takes an age to load even though this is the fastest box you could buy; or when a screaming multi-proc box with masses of memory crashes to a halt for a second or so and the whole world locks up for no apparent reason whilst explorer goes la-la for a moment - but perhaps there are other reasons for these things...

Anyway, I've often wondered why the OS calls DllMain() from within the loader lock when, to me at least, it seems like it would be far better to call it after releasing the lock. The only reason I can think of is that if DllMain() fails then the loader needs to unload the Dll again and to do that it needs to frig with the process tables again and to do that it needs to hold the loader lock again... If that's the case, surely you can release the lock, run DllMain() and then act on the result. If you need to unload the Dll due to initialisation failure then you just acquire the loader lock again... I expect I'm missing something obvious but...

The reason I'm thinking about all this is that I've been playing with intercepting functions in Dlls. There's plenty of info on the web and in my book collection for this kind of thing and I got 80% done in no time. As usual the last 20% took 80% more time... There I was, happily hooking various API functions from my Dll and 'doing stuff' when I decided that I should intercept some more functionality. I hooked the new API calls and was surprised to see that my hooks were ignored. It took a while to work out what was going on. The calls to the API were happening before I hooked the Dll. They were happening before LoadLibrary() returned because they were part of static object initialisation within the Dll. I called LoadLibrary() it loaded the Dll, pulled in all the dependant Dlls, did all the fixup magic and then ran DllMain(). Since the Dll in question used the standard C runtime the Dll entry point that was called was actually _DllMainCRTStartup@12() which deals with starting up the C runtime, part of which includes calling constructors for file scope static objects that live in the same file as your DllMain() and then calling DllMain(). Since I, and all the examples that I've seen, hook Dlls after LoadLibrary() returns I was unable to hook the API in question before the Dll called into it.

Looking at the docs for LoadLibraryEx() I hoped that I could just use DONT_RESOLVE_DLL_REFERENCES to prevent DllMain() being called. Unfortunately, DONT_RESOLVE_DLL_REFERENCES does exactly what it says on the tin, it doesn't resolve any Dll references and it doesn't call DllMain(); unfortunately this made it useless to me as I needed the Dll references resolved before I could hook them...

I searched quite hard for a fix to this problem, thinking that it was an obvious thing that people would want to do. I didn't find any solutions so I started to delve into how Dlls were loaded. This research ended up with the Microsoft Portable Executable and Common Object File Format Specification which is the document on PE files (Dlls, exes, etc). The code that I had for hooking already read information out of the Dlls PE file to locate import address tables and the like, I hoped there was something else in there that would help me hook DllMain(). I didn't find what I was looking for, but I found something much more useful.

Early on in my investigation of the PE file format I came across an interesting sounding field in one of the file header structures; AddressOfEntryPoint this is the address that's used to start the image in the PE file; it points to whatever calls main() for exes and whatever calls DllMain() for Dlls. Once I found out about this I decided that I could change this value and make it point somewhere else so that DllMain() wasn't called during LoadLibrary() but when I wanted it to be called. I started dusting off my inline assembler skills (didn't take long, they're not very big) and thinking about how I could move the entry point address so that it pointed to the end of the real DllMain() and all kinds of other cunning plans. Then I read the document and saw that the field could be set to 0 for Dlls that didn't have entry points; like resource only Dlls, I imagine.

So, my current solution to hooking Dlls so that I get my hooks in place before DllMain() runs is this. First I load the Dll image from disk and read the AddressOfEntryPoint field from the header, I resolve the RVA into a real address and write a 0 back into the image. Then I call LoadLibrary() which loads the Dll and all dependant Dlls, does all the fix ups that are required and doesn't execute DllMain(). Then I run my hook code and finally I use a few bits of __asm magic to set the stack just right and call into DllMain(). This all works fine and achieves just what I want to do but now I'm nervous. If this works then why doesn't the OS do it like this in the first place? Why doesn't Windows release the loader lock before calling DllMain()?

6 Comments

There was an interesting article on the PE format and how it was just flexible enough that it could be hijacked for .NET executables without having to patch the OS and change the format. Can't remember the link though :(

Yeah, I read that. You can pretty much do what you like if you set the entry point to something that knows how to deal with your image - like mscoree.dll does with .Net dlls.

The reason the lock is necessary is other threads may call GetModuleHandle() and GetProcAddress() while a DLL is being initialized. The lock will block the other threads form acccessing the DLL until DLL_PROCESS_ATTACH is finished. If the lock were not present, other threads may execute code in the DLL before it has finished initializing.

For similar reasons using DONT_RESOLVE_DLL_REFERENCES is a bad idea.

Jonathan

Thanks. I figured as much but to be honest I don't see why the lock for that isn't per process. It doesn't need to be global. Then the global lock could be held for just the time required to add the fully initialised DLL to whatever global structures there are...

Anyway, this isn't an issue any more as I switched to doing all of this in a completely different way.

This was posted a long time ago, but just like I found the page, others still can, so I'd like to make an important correction. The loader lock is *not* global across the machine. It is a per-process lock. You're right to be nervous - don't mess around with it. Running DllMains outside of the loader lock can cause a lot of problems (aside from the ones mentioned above).

Dou, fair enough.

Leave a comment

About this Entry

WTF? was the previous entry in this blog.

They're learning is the next entry in this blog.

I usually write about C++ development on Windows platforms, but I often ramble on about other less technical stuff...

Find recent content on the main index or look in the archives to find all content.

I have other blogs...

Subscribe to feed The Server Framework - high performance server development
Subscribe to feed Lock Explorer - deadlock detection and multi-threaded performance tools
Subscribe to feed l'Hexapod - embedded electronics and robotics
Subscribe to feed MegèveSki - skiing