How the DAC saved the day

I got a call this morning from a coworker. One of the database instances was unreachable with the message that the TempDB log file was full. No processes could login and the only way to get things going again might be to restart either server or instance. Restarting the instance should be the last resort, because after a restart SQL Server will plough through all the logfiles to find transactions to either roll forward or roll back. That will take more time that you want. But, we had the famous DAC backdoor installed.

You can find this option under the instance. Right-click the instance and go to facets.

SSMS couldn’t log in, but by starting a new query we did get the login screen. Adding the word ADMIN: before the name of the instance, we got in. This a substantial difference to logging in to the instance. This starts all kinds of stuff that don’t work under the admin connection.

Don’t click connect, click New Query and log in with your ADMIN connection

At that point, all we had was the blinking cursor but in the familiar environment of SSMS. Note that most things you hope work don’t, but the basics do.

First thing we tried was exec sp_who to see if we could see something there. Besides a load of processes, nothing really out of the ordinary. Ok, let’s see if sp_WhoIsActive gives more information. But all that procedure could show was the error of the TempDB logfile being full.

Next we tried DBCC OPENTRAN. This procedure shows open transactions. Lo and behold, there was a transaction running since october 12th!. The procedure shows the SPID as well, so we got to check with sp_who and sp_who2 if it was worth taking the risk to kill it. As the database it was working on was less important than all the processes waiting for TempDB to start working again, we decided to kill that process.

After checking sp_who we saw the process was gone. Next was a check on sp_WhoIsActive. This procedure returned results! Next, we tried logging in to the instance with a regular account. This also worked. Last check was on the procedures and processes that had to start again. And, those started running again as well.

exec sp_who;

exec sp_WhoIsActive;

DBCC OPENTRAN

-- kill ## /* be aware of the process and it's work before you kill it!

exec sp_who;

exec sp_WhoIsActive;

DBCC OPENTRAN

Now that the main pressure was gone, my coworker could continue to find out what process it was that was hugging the TempDB logfile for way too long. Besides, we advised a reboot of the instance (it had been online for way too long) to clear out any remnants in TempDB and other logfiles. As it’s a datawarehouse instance, some downtime of the instance isn’t hurting as long as it’s planned.

I think we were very lucky that there was only one process involved in this issue and that killing it meant all restored to normal.

I hope that if you find this post in your hour of need, it’ll help you on your way to a solution.

Thanks for reading (and good luck!)

One thought on “How the DAC saved the day”

Leave a comment Cancel reply

Share this:

Related

Published by reitsees

One thought on “How the DAC saved the day”

Leave a comment Cancel reply