Crouching Driver, Hidden Message

This issue was a fun one, brought to me due to a single replica in an availability group constantly having an issue where it seemed that SQL Server just stopped processing certain requests. If you’re reading this and thinking to yourself, “Just one replica in the topology? What’s different with that one replica?” then give yourself a pat on the back, you’re the right type of person. If you read that and thought, “I wonder what bug was in SQL this time?” then please stop using technology.

The symptoms of this issue were interesting, every so often the instance would just kind of get “stuck” – at least that is how it was described to me. Nothing would work, cancelling queries, attempting to kill queries, submitting anything new, nothing seemed to really do anything except restarting the service. Once the service was restarted, the instance (and AG) would hum along nicely… until some random time later when submitting different queries would just grind to a halt. Fun.

Since it was a query that seemed to be just waiting forever, no matter what, it’d make sense to get the stack for that thread, which is listed below.

ntdll!ZwWaitForSingleObject
ntdll!ldrpDrainWorkQueue
ntdll!ldrpInitializeThread
ntdll!_LdrpInitialize
ntdll!LdrpInitializeInternal
ntdll!LdrInitializeThunk

Sure looks a whole lot like not SQL Server related items, which is correct, it’s plumbing in Windows dealing with creation of a new thread. In this case it has to do with the loader lock, which had some enhancements done to it in Windows 10+ for parallel loading of libraries. It’s waiting for all outstanding work to be completed in the loader infrastructure before continuing, so what in this process is holding the loader lock (which is process wide) or otherwise not allowing the work to drain properly? Time for a memory dump.

Looking at the dump, there is another thread which is attempting to load a library.

ntdll!ZwWaitForSingleObject
ntdll!ldrpDrainWorkQueue
ntdll!LdrpLoadDllInternal
ntdll!LdrLoadDLL
KERNELBASE!LoadLibraryExW
xpsqlbot!SqlGenericLoadLibrary
xpsqlbot!SsLoadLibraryGeneric
xpsqlbot!SsLoadLibrary
xpsqlbot!xp_qv

SQL Server is attempting to load a library for use, sure, makes sense. Seems pretty normal… but why is this also hanging up on draining the queue? Even if the loader lock was somehow orphaned, the existing threads should continue to execute without an issue – assuming no loading of new libraries or resources. It doesn’t seem as though this is the actual problem, but another symptom of a different issue. Gotta keep looking.

Continuing through the dump, this magnificent beast of a stack showed up. I’ll let you try to figure out what it means.

win32u!ZwUserWaitMessage
user32!DialogBox2
user32!InternalDialogBox
user32!SoftModalMessageBox
user32!MessageBoxWorker
user32!MessageBoxTimeoutW
user32!MessageBoxTimeoutA
user32!MessageBoxExA
user32!MessageBoxA

If you said, hey this shows a modal message box (gui) element, you’d be correct. Now, why would a headless service (SQL Server) which typically runs as a service attempt to show a message box (GUI element)? That seems like a huge bug, because a modal message box is going to stop everything and wait for a response on that thread from the user. If nothing can continue processing on that thread and it’s holding a lock, no one else is going to be able to get past it… and since there is no GUI for SQL Server, no one is going to ever be able to respond to the message box. Ok, so seems like we need to figure out who is trying to do something to stupid, so against everything you’d do for a headless service, that it’s freezing up the rest of SQL Server… which component is causing all these “bugs”. Let’s continue down the stack for this thread and see if we get any hints.

OraOLEDB19

That’s right. The Oracle OLEDB driver loaded in process is attempting to show a message box and completely hosing the instance of SQL Server. So much for this huge magical SQL Server bug…. the crappy Oracle drivers strike again!

4 thoughts on “Crouching Driver, Hidden Message”

    1. 100%. It’s not the first time the Oracle driver attempted to show a message box either :/ It really bothers me that the .net drivers will show authentication windows also.

Leave a Reply

Your email address will not be published. Required fields are marked *