So Far Away From Me: Managing Network Gear That’s Over The Horizon

Boy-with-binoculars

Image via Wikipedia

A not-especially-new challenge facing network engineers is that of far away management: how do you make sure you’re always able to manage gear that’s further distant than a quick ride in the car could handle? Even smaller networks can have a global spread, making this problem common. Here are a few scenarios I’ve faced recently, where I had to think through what I was doing, so as to avoid cutting off my access to remote devices.

  • Updating the public-facing IP address of a remote firewall, when my normal access to that firewall was via the external address I was changing.
  • Changing the tunnel peer IP of a site-to-site VPN device, when my normal access to that VPN router was via one of the tunnels whose parameters I was changing.
  • Updating a public BGP scheme for a site, when knocking down the BGP session meant knocking down my normal means of accessing the routers I needed to work on.
  • Managing a remote firewall whose anti-spoofing and routing tables needed tweaking after a new path to a remote network was introduced.

There are many answers to these and similar scenarios. Some of them cost money. Some of them come with experience…almost every engineer has a war story that includes a phrase like, “I hit enter, and then the console stopped responding. I felt like I was gonna throw up.” Yeah. My personal favorite is from my early days as a packet pusher, when I typed “debug ip packet” on campus border router. The console quit responding, and then that section of the campus went red on the management station. “Uh, I’ll be right back. Gotta go reboot the router across town.” Ah, the good ol’ days. Pardon my digression.

So what ARE the strategies for making sure you don’t lose access to that device an ocean away while you’re working on it?

  • Good documentation. Your best defense against doing something stupid is making sure you know what’s going on at that remote site. Make a detailed diagram, modeling every link, every IP address, every VLAN number, WAN circuit IDs, and anything else that could possibly related to your project. I’m not suggesting you have to manually diagram every access port, but router and switch interconnects are a must, as are all in-path devices: firewalls, IDS/IPS, VPN concentrators, ISP border routers, load-balancers, WAN optimizers, and the like. This should include physical labels on remote devices which will aid folks at the site if you need them to do something for you.
  • Do your planning while you’re awake. Don’t put yourself in the position of a 2am maintenance window where you have to plan out the details of your work. At 2am, you’re probably tired, and Chargers Chocolate Espresso Beans are not going to bring clarity to your addled brain. Do your planning ahead of time. By “planning”, I don’t mean that you should merely throw a quick task outline on your whiteboard or in a text document, although that’s a start. I mean write out every step and every bit of code that goes with it, and then have a trusted co-worker sanity check you. No trusted co-workers (or even untrusted ones)? Take a potential acolyte, throw ‘em in a conference room, and explain your plan to them. Use the whiteboard, but do not sniff the markers excessively. Gesticulate wildly. You might find that talking through the plan, even with someone who doesn’t know enough to second-guess you, could reveal a fatal flaw.
  • Make sure you’re on the right device. I happen to use a tabbed console program to manage my gear. If I’m not careful, I can paste code into the wrong terminal window or make changes to the wrong firewall policy because at a glance, THEY ALL LOOK ALIKE. Sometimes, devices (especially redundant pairs) have very similar names, which can get confusing as you go back and forth between them. Some tricks I have used to keep myself straight in the midst of a change include using different backgrounds, different font and color combinations, opening certain devices with a read-only account to prevent inadvertent changes, and temporarily renaming devices. Being able to refer back to your awesomely detailed network diagram can also help clear away confusion.
  • Understand exactly what your next command will do. Ask the right questions and answer them in your mind before committing. By that, I mean that you know that you know that you know what’s going to happen when you hit enter. This is especially crucial when making a configuration change that will impact the routing table of a far-away router (and potentially other devices) you’re working on. For example, I recently had an uh-oh moment when I shut down the BGP session of a non-important router I was prepping for future service. When I shutdown the BGP neighbor, I lost the advertised default route. When I lost the default route, the remote router didn’t know how to get back to me. That was a silly mistake because I was in a hurry, and I know better. I just wasn’t thinking about it before I shut down the BGP session. It happens. Don’t let it happen to you.
  • Think through your security scheme. Border routers often have their VTY and/or interfaces protected by access lists. That access-list is generally made up of known hosts or networks where management traffic should originate from. Often not included in these border ACLs are the device’s connected networks, as that makes standardization of the management ACL difficult. In practice, that means that even though a troubled device you’re working on might be physically accessible via an adjacent device, your access list could stop you from taking advantage of that adjacency. That’s a bummer if jumping from a device you can reach to the one you otherwise can’t would have saved you.
  • Someone on site. Most errors can be fixed with a power cycle, assuming you didn’t commit broken code to the NVRAM startup configuration first. A human can cover a power cycle for you. Flesh and bone might also be able to connect a console cable to a system you can RDP or SSH to at the remote site, allowing you to undo whatever you did to kill the device in question via serial connection. Hey, it’s a little embarrassing, but better than a snail-mail repair.
  • Scheduled reload. A lot of folks like to schedule a Cisco device reload before a significant change via “reload in X”, where X is a certain number of minutes away that the device will reboot itself. That way, if you make a mistake, you just have to wait for device to reboot itself and load the startup config that had been working, giving you another shot to try again.
  • Remotely managed power strips. Some fancy and usually expensive power strips allow you to cycle power to a specific socket, which could potentially allow you to remotely reboot a device you’ve bricked.
  • The road less traveled. Multiple paths to the same network give you options you don’t otherwise have. Those additional paths could be in the form of a cheap Internet line you only use for backdoor access, an out-of-band network, a terminal/console server, dial-up, or allowing temporary access to a device interface you would normally not use for management.

Do you have a favorite trick or technique you’d like to share to keep your remote devices accessible during risky changes?

Ethan Banks
Ethan Banks, CCIE #20655, has been managing networks for higher ed, government, financials and high tech since 1995. Ethan co-hosts the Packet Pushers Podcast, which has seen over 2M downloads and reaches over 10K listeners. With whatever time is left, Ethan writes for fun & profit, studies for certifications, and enjoys science fiction. @ecbanks
Ethan Banks
Ethan Banks
  • http://cisconotsysco.wordpress.com jbl76

    Good stuff. btw, what is your tabbed console application of choice?

    • http://packetattack.wordpress.com Ethan Banks

      I live in a Windows world, and use ZOC by Emtec. Tabbed, scriptable, transparency. 30-day free eval, $80 to buy. Can’t live without ZOC.

      http://www.emtec.com/zoc/index.html

    • http://www.mostlynetworks.com Scott McDermott

      I’ve been using SecureCRT for SSH/Telnet/Serial consoles since it was CRT and I was filing enhancement requests for SSH support. Great software.

  • Peter

    For the scheduled reload, don’t forget “reload cancel” when you’re sure it’s all working!

  • Daniel G

    One thing I like to do when making changes to a remote site is to work from that site backwards. For instance if I need to make changes to BGP or a serial interface that will take down the site, but I know once I make changes on my side it will come back up I do the remote site first and then my side.

    I also have cheap internet connections (GRE over ipsec tunnels) for out of band access and feasible successors.

  • http://www.mostlynetworks.com Scott McDermott

    I’ve traditionally had a secured modem attached to the console on key pieces of remote equipment. That technique works, but is a bit long in the tooth. I’m looking at switching to the opengear stuff with a 3G card in it for my branches. At $20/month for a 1GB plan, it’s cheaper than a business phone line.

  • Pingback: Internets of Interest:9 Jan 11 – My Etherealmind()

  • http://www.idealnetworkengineer.com Rowell

    Great tips! I found a number of them really helpful for my job. It’s the little tips from experienced engineers I love to hear.

    I’ll be making diagrams all this month!

  • zumzum

    reload in is very disrupting and messy… but on some older IOS its the only way. anyways, check this comment out:
    source:http://blog.ioshints.info/2011/01/schedule-reload-before-configuring.html

    schilling2006 said…

    Isn’t Cisco IOS configuration replacement and rollback better without reload?

    http://www.cisco.com/en/US/docs/ios/fundamentals/configuration/guide/cf_config-rollback.html

    It might be hard to read the documentation, in simple, tested way

    Conf t

    Archive

    Path flash:myconfig

    Exit

    Wr mem

    Archive replace nvram:startup-config force time 3 (3 minutes)

    conf t

    ##make all your changes

    ##if you lose your session or you don’t want to save your configuratins, then after 3minutes, the configuration will be rolled back to nvram:start-config

    #if you continue to have session access and want to save the change you made

    Exit

    Configure confirm

  • RichardF

    Once got snagged up on a remote ASA, reload in 10 worked perfectly, and then the standby unit took over. Lots of swearing but since then have always remembered to reload the secondary too