If you’ve been in IT for more than a few minutes you been here before. Widget X is burning, the organization is hemorrhaging cash, and we’re all doomed unless some caped infrastructure superhero swoops in to save the day. You’re just the hero for the job, but what are you going to do? I’m going to highlight four steps to confronting any emergency, big or small.
Step One: Recon and Preparation
Two questions: “Do I know what’s actually going on?” and “Do I have what I need to fix this right the first time?” Let’s qualify both.
Sometimes we get good information during a failure. Other times, someone thinks the world is on fire when there isn’t even an issue. Verify the issue(s) and verify the scope. Gather any monitoring data, logs, or informed opinions from colleagues you might need. When you have a functional understanding, you’re ready to…get ready.
Now that you have an informed view of the issue, prepare for success. Make sure you have any tools you’ll need. This includes literal tools, access, your recon details, and personnel. Also, be sure you aren’t starting a new fire by ungracefully halting your current work. Finally, prepare your time and space. Is a director slowing you down by standing over your shoulder and pressing the urgency? Politely rope in your manager or an intern to entertain the director. Now jot down a broad plan, even if only in your head. You can use these steps to give yourself a framework. Include a hard path: hop-by-hop, OSI model bottom up, or anything where you can check off functional areas to be thorough. Now throw that plan out the window, for a few moments, to move on to the next step.
Step Two: Follow Your Gut
This could be the fastest path to a fix, especially if you have experience in your field. There’s a caveat, though. Set yourself some limits. Whether you measure in time spent troubleshooting or lost production, set a hard cutoff to help yourself decide when following your gut isn’t cutting it. If you’re gut fix checks out, you’ll affirm your credibility. On the other hand, should you get yourself lost chasing ghosts during a crisis, you may have to face the consequences.
Step Three: Dive In
Your gut failed you, now let your troubleshooting plan shine. Whatever model you picked to follow, follow it without skipping around. Fail to be thorough and you could skip over the issue, burying it behind you in your known-good path. If you find you’re moving towards the end of your plan with no solution or if you start to feel like the issue is over your head after diving in, now is the time to escalate. Don’t wait until you’ve exhausted all your options, as anyone you rope in will have a lead time before they can contribute.
Step Four: Rinse and Repeat
You did your best, but the issue hid better. You’ve called in reinforcements and now it’s their problem, right? Go home, grab a beverage of your preference, and relax.
Your first responsibility now is to be available to your reinforcements. You may have information they need. Don’t just stand in their cubicle and watch then smash the keyboard, though. It’s time to check your work.
Go through your failed plan and make sure it was honestly thorough. If you find a hole, check it for the fault. If not, come up with a new plan from another angle. From here go back to step three, multiple times if needed. Eventually the troubleshoot will widen with perspectives and personnel and expose the solution.
So why is this troubleshooting framework better than calling for all hands as soon as you smell smoke? We’ll answer that and wrap this up with a thought experiment.
Picture a tennis court sized room with twenty blindfolded IT pros standing around. Now light a match anywhere randomly, put it out, and drop it on the floor. One of the pros has to find the match.
One or two people may have been close enough to hear it. They can follow their gut and feel where they heard the match struck.
If that fails, someone who smells the smoke can plan to follow their nose to the match. If that person passes through the strongest waft of smoke without feeling the match, they can ask for others near them to follow their voice until they can do the same. As people join in, the zone of intense scent becomes more filled with searchers and it widens. A critical density of personal and knowledge around the match will be reached. The match will be found.
Now consider the result if the rules were free for all? Given enough time the match will still be found, but the searches will be slowed to a literal crawl by errant search patterns, people bumping into each other, and hands being stepped on.
Avoid hands being stepped on with the methods I’ve outlined here.