The Monday morning alarm went off unceremoniously, bleating its doleful 5:15a tale that I had someplace to be other than warm and in my bed. I reached out from under the covers for the Blackberry to review the overnight messages from the network management station. Nothing. All was well with the world. Circuits up. Sites connected. Quiet day expected. The global corporation I support could continue its growth spurt, empowered by the network.
I opted to ignore the alarm clock’s sound advice, and went back to sleep until 6a. Upon checking the Blackberry again, I had received several distressing messages. Looked like the head-end wide-area circuit was down. I’d lost the ping point in the telco cloud, as well as all the remote offices. Uh-oh. That had been a problem circuit, but we’d worked with the telco’s “chronics” team, and it had been stable for months. Were we back to having trouble with that thing? I sincerely hoped not, but also started wondering if maybe we’d be wise to shop some other providers for a cheap redundant link. Something to talk to the boss about.
With some haste, I went down to the home office, fired up the laptop, and VPN’ed in. I RDP’ed to my jump box…only…I couldn’t get there. Hmm. So I RDP’ed to the management station. It greeted me with a login screen. I tried another host that should have worked, but it failed. And then the RDP to the management station froze. And then I lost the VPN connection completely. A little bit of poking around, and I knew we’d also lost the big Internet pipe into HQ. Headend WAN gone, and now the Internet pipe was gone as well.
NOT GOOD. I headed up to the telco’s web site, and logged trouble tickets about the downed circuits.
I thumbed out a vaguely panicky message on the Blackberry to let some other folks know that Things Were Bad. I was then unsure that they’d be able to get the message on *their* Blackberries, since home base had just become an island. So I called my boss. “Hey, uh, big boss man sir? Your majesty? Sorry to wake you at this hour your eminence, but…um…how do I put this? Let me just get it right out there. We have no network in or out of the castle. Please don’t shoot the messenger. I love my job, highness, and I swear it’s not my fault.”
His eminence sighed, as he is prone to do, and said he’d give it a half-hour before calling an even higher-level potentate to let him know we were all lords in the Kingdom of Fail. And so to our chariots we went, galloping madly towards the castle.
I made a pit stop in my headlong dash towards the castle for fuel: a breakfast sandwich and coffee. A large, caffeinated, flavored coffee with lots of cream and sugar. No sooner had I shut down the chariot and stepped outside than my Blackberry went off. It was a voice call, peculiar in that I don’t get those often. Caller ID suggested it was my liege, and so I answered. “You summon me, Master? It is your servant, the humble packet herder.” The response was underwhelming: silence. Silence? The silly Blackberry had not severed its Bluetooth tether to the chariot quite yet. Grr. I punched the red disconnect button, and waited. When the inevitable return call came, I was informed of grave news. The building power was out, nay, the CAMPUS power was out. Sort of like multiple organ failure. Eventually, the body just isn’t going to make it. I checked out of Chez Local Convenience Store and headed back towards Castle HQ.
The walk from the chariot to the castle was cold. A gray wind blew across the parking lot, rustling what few brown leaves still clung to the fading memory of autumn. The mag locks had failed, so the exterior doors opened freely. The emergency lighting was on. The heat was off; the building was cooling down. The hallways were quiet. The cell repeater on the roof was dead, rendering my connection to the outside world tenuous. The building’s electric soul had departed.
Inside the concrete corpse, pockets of activity defied the surrounding electric necrosis. For instance, our production room was up. Our dev room was up. The diesel generator was in full-swing, burning dinosaur bones with the fervency borne of urgency. The network closets were also up. The ethernet was therefore alive, as was the wireless LAN. Users with laptops and charged batteries could work off the local network, assuming they could brave the falling temperatures.
I made a trip into the production room, the glow of blue and green LEDs from the racks lighting my way. I noted that both the Internet and WAN routers were up. That told me that our equipment was not to blame for the dead circuits. Big boss man and I headed to the landlord’s telco room, where we found that the LEC’s gear had mostly lost power. We made a few inquiries, and found that the campus power had gone down in the middle of the night. A few hours later, the LEC’s telco racks also died, their feeble batteries gasping their last despite a valiant effort. We looked at the telco’s gear and saw a world of fail, red lights and major alarms lit on just about everything.
We went back to our desks and waited. Periodically, we received an update from the facilities folks about when the power company might have the issue resolved. And so monitoring stations were monitored hopefully. Blackberries were charged from laptops, just in case. Emergency brainstorming sessions were held. The smell of whiteboard marker filled the air. Alternate remote access strategies were hatched. More diesel was fetched (and spilled). Still we waited.
Around 10am, the building sucked in a mighty gulp of electric air, and its steel heart began beating again. The lights came back on. Ventilation burst forth. Mag locks sealed doors shut. My Blackberry displayed 5 bars. Everyone began firing up their desktop systems with multi-monitor displays to see what was what. I checked routing tables, adjacencies, and traffic loads on the previously dead circuits. All was normal. All was well. Connections were made from around the globe. Business was being done once again.
E-mail began to flow, with many CC’s and “replies-to-all” from folks up and down the food chain answering questions, making comments, expressing outrage, and just generally being heard. A fuller picture of what had happened formed. Another building tenant had contracted for a new line of whatever sort to be brought into their building. The DigSafe people had mismarked the ground, causing a main electrical feed to get nicked some days earlier; we’d experienced that as a mere flicker. Over the course of time, that nick resulted in a catastrophic line failure, taking out electricity to the majority of the campus. The repair was not a quick one, taking roughly 8 hours.
We’d been victimized by construction and equipment we had no control over. It happens, but it didn’t make the impact to the business any less real.
So that was my Monday. With it, I’m working with a team of others to recover from this sort of event much quicker in the future. Anyone selling dark fiber?