The End of Year Audit: Things Worth Checking While You Have A Minute

It’s the holiday season, and the screams of GET IT DONE NOW are gently muted, replaced with the quiet sounds of people gone on vacation and leaving you the heck alone. What better time to do some of that network clean up you’ve been putting off? Let’s make a list of fun things to check.

Etherchannel Dead Links

The wonderful thing about etherchannels aka port-channels aka link aggregation groups is that they are redundant parallel paths between two points. But if one of the link members went down, would you know? That all depends on how well your network monitoring & alerting system is set up. Let’s say it’s possible you missed that one of your etherchannel links is dead. Why not take a little time to verify that all etherchannel links are up?

Ethernet Duplex Mismatch

A fading yet still common problem is the simple Ethernet duplex mismatch. Even in a network with gigabit everywhere, it’s possible that inferior wiring caused a link to negotiate to 100Mbps instead of 1000Mbps. And you’ve still probably got a couple of dinosaur servers or printers out there that only support 100Mbps. Take a moment to scan your switches for duplex mismatches.

HSRP Consistency

You KNOW all your HSRP groups are fine, right? All HSRP peers are configured, can see each other, and priorities are as intended. Right? How sure are you? Might not be a bad time to review your HSRP groups and search for anomalies.

Spanning-Tree Roots

Without a doubt, you purposely configure your spanning-tree root bridges, and have further protected them with features like root guard. Or…um…perhaps there is a little doubt. If you check your spanning-tree root bridge for all VLANs, is your STP topology what you expect? Or is it possible some switch got added to the network a few months back and was elected as the STP root for one or more VLANs? There was that short but unexplained outage when you turned that switch up, now that you think about it.

VPN Extranet Tunnels with no Business Relationship

The end of the year is a great time to scan through your configured site-to-site VPN tunnels and verify that they are still needed. Many of your VPN tunnels will probably go to remote corporate offices that you well know the status of. These probably are not your main concern. Instead, you want to look out for tunnels with extranet business partners that your company might not have a business relationship with anymore. When’s the last time traffic went through the tunnel? Who within your company is the business owner for the relationship? Can you verify how long your company has a contractual relationship with the extranet partner? If you don’t know that information, this would be a great time to gather it and document for future reference.

Dead VPN User Accounts

Along the same lines as the VPN tunnel problem, VPN user accounts also need to be reviewed from time to time. If your VPN users are fully integrated with a directory service and you have a good internal termination process, this might not be a big issue. But if you use a non-integrated authentication method for VPN users, you need to know that all enabled accounts are legitimate. Beyond employees, this is also important for contractors to whom you provided a temporary account. It’s not a bad practive to give contractors (non-employees) an account with an expiration date at the end of the year. Reset the expiration for another year into the future if you can verify their status, and disable the account if not.

Firewall Rules with No Hits

Servers come and go in the DMZ. VLANs rise and fall as corporate network topologies change. Business partners change. Etc. What’s that mean for your firewalls? Probably quite a lot. The end of the year is a great time to give firewall policies a review. Any rules not in use? Disable them, and see if anyone complains over the next month or so. If not, remove the rule.

Dead Firewall Objects

While your sysadmins are probably great about notifying you when new servers need firewall policy updates, they are probably terrible at letting you know they decommission a server. Therefore, it’s likely you’ve got a firewall policy populated with dead objects that need to be reaped. So put on your black robe, get out your scythe, and reap the dead. How do you know they are dead objects? Look for the object ARP caches in the router or firewall on their segment. If pinging the object fails and you notice an “incomplete” in the ARP cache of the gateway for that object, it’s a candidate for reaping.

VLANs with an Empty ARP Cache

If I had to guess, there’s at least one IP renumbering scheme or other sort of migration from one IP block to another ongoing in your network. Once the renumbering/migration is complete, you should kill the VLAN that was left behind. You can check by looking at the VLAN’s ARP cache. Nothing in there? Double-check with a ping sweep and talk to a sysadmin, but there’s a good chance the VLAN is an empty wasteland, devoid of life. And when you clean it up, that means both L3 interfaces used as gateways, L2 VLAN definitions, and related configurations such as spanning-tree topology & VLAN trunk allow statements.

DNS Servers Current in Network Devices

Network gear often gets forgotten when core services like DNS servers get updated. If you rely on domain name resolution in your router/switches/firewalls, take a moment to validate that the DNS servers they are using match the current company standard. Those old DNS servers won’t be around forever.

Dead Static Routes

What are you pointing at? Not sure? This is a good time to figure it out, and then document the static routes. Find a static route that doesn’t need to be there anymore? Make it go away. Clean & tidy are the routing configs we like.

Missing Static Routes

In a dual-core environment, it’s not uncommon for static route entries to be mismatched between the two devices. If all remote destinations are accessible, it’s possible no one will notice the mismatch. Take some time to verify that the static routes match between the devices.

In addition, think about what you use static routes for. Floating static routes are often used to point to an alternate path to get to a remote office, if the primary path goes away. As the remote office network has changed, have you remembered to keep up with floating static routes so that your failover works as expected?

Broken Config Backups

If you don’t check your automated config backup system, you could have devices that have become inaccessible. This means that you probably no longer have a current configuration of the device. Take a few minutes to verify that your config library is current. When you find devices that can’t be backed up by your network management system, find out why. Commons reasons I’ve run into include SNMP engine crashes (usually on very old devices), firewall policy changes, and overlooked device changes that require the configuration management tool to be updated.

Network Admin Dead Accounts

You remember that guy. He didn’t talk much, and he kinda creeped you out. But he seemed to be able to fix stuff. So, you locked him in the data center with the pretty blinkenlights that enthralled him, and threw in some raw hamburger once in a while to keep the animal fed. All was well until he abruptly gave his notice and took his body odor to Comic-Con to find his soul mate. You have disabled all his accounts, right? I don’t just mean his MS AD account. I mean all the *other* accounts, too. The VPN backdoors. The pseudonyms. The scripts. That guy is probably entwined deeply in your network. So, take the time to clean out all memory of him. Otherwise, he’s a risk you can’t afford. Like a fat guy doing cosplay.

Disk Space

How’s your disk space these days? Oh, switches and routers don’t have disk, you say. Most don’t, fair enough. But what about your load balancers? Firewalls? Log servers? Report servers? NMS? FTP? Etc. How low is your disk getting? You don’t want to find out that your log server can only hold about 3 days worth of events because the disk can’t keep up with all the new stuff you started logging to it this past year. Or that the next file that gets dumped on your FTP server will be the last because you can’t throw anything away. This is a good time to take stock that you’re ready for the new year. And if you’re not…hey, those storage guys just put up a new array, didn’t they? Go getcha some terabytes and give the new backup system something to do.

Conclusion

This is far from a conclusive list. Rather, just a few things to get your brain spinning on stuff worth auditing once in a while to avoid that pesky bite on your right glute when you least expect it. Feel free to add your favorite audit items in the comments. Happy holidays to all!

Ethan Banks
Ethan Banks, CCIE #20655, has been managing networks for higher ed, government, financials and high tech since 1995. Ethan co-hosts the Packet Pushers Podcast, which has seen over 2M downloads and reaches over 10K listeners. With whatever time is left, Ethan writes for fun & profit, studies for certifications, and enjoys science fiction. @ecbanks
Ethan Banks
Ethan Banks
  • https://twitter.com/douglashanksjr Douglas Hanks

    This is a good list and should be in every operations check-list or automated regression.

  • Billy

    Ethan, This is a great list, and its things I can delegate to lower lever techs! We don’t use Cisco in our infrastructure but these items fit in all vendors in one way or another, for instance we use VRRP instead of HSRP. This will go on the SOP document, and probably become a quarterly check vice an annual one. Thanks

  • http://twitter.com/dhanakane Dhana

    What? No change freezes where you work?

  • Ned_Kay

    Excellent list. Also on my end of year schedule is:

    – If your devices are VTP clients, a text dump of all VLANs to quickly return to service should the VLAN database get overwritten. Unlikely, but good to be prepared for the worst.

    – Is every device running the software version you assume it should be? I recall seeing a failover glitch due to a mismatch some time ago. Might be a good time to clean up flash storage if multiple OS versions exist after upgrades too.

  • http://twitter.com/Vegaskid1973 Matt Thompson

    You missed firing up a Call of Duty server and having a LAN party.

  • Jay Swan

    Great list, and something I’ve been banging away at myself lately. Some of the responses I get are great: “Q: Hey, this VPN tunnel hasn’t passed any traffic in six months; can we get rid of it? A: Let’s keep it for another year, just in case.”

  • will

    once fixed, go slap your NMS for not reporting these earlier.

  • http://twitter.com/tomcooperca Tom Cooper

    That’s a broad brush you painted of all network engineers. For the record, I will meet my soul mate at Quakecon – thank you very much ;)

    Kidding aside, this is a great list. Very helpful for that end-of-the-year double-check.

  • Alexandra Stanovska

    I’d also add some I meet pretty often:
    – SNMP/vty ACL entries from unused legacy subnets
    – unused SNMP communities

    – long dead BGP peers and advertised networks kept just in case of fallb… what exactly is the reason today since network migrated 9 months ago? ;-)

    +1 for LAN party.

7ads6x98y