Scalable Versus Noisy Configurations

It’s a funny fact of being a network admin that you might need to focus for half an hour to come up with a 3-line configuration. As you climb into the role of architect, that problem gets progressively worse. It’s entirely likely that you’ll occasionally spend over an hour of intensely focused time combing through configurations and configuration guides to come figure out what change a single route tag will do. Then you carefully monitor and repeat pre-change baselines while you do it, and sometimes find you still missed a spot.

Which is why I implore you to not overengineer your configurations.

I avoid criticizing my predecessors as a rule, and I’m not going to do it here. Unless something is completely off the wall, there might be a reason behind that decision you just aren’t privy to. It might have worked out to be the wrong call but who here plays the game of life with perfect information? That said, avoiding overengineering is a rule of thumb that will pay off in spades.

There’s a natural tug-of-war between overengineering and scalability.* I’ve been a part of plenty of “greenfield” deployments and they’re loads of fun. You finally get your wish of being able to construct that perfect vision free of the technical limitations of the legacy hardware and the irritating need for that “uptime” thing your manager keeps crowing on about. Thus it is that so many of us go into teenager-with-credit mode and start blowing all our reasonable configuration lines on luxuries that end up just sitting there until someone throws them out five years down the line.

I’ll give you an example. Have you ever tried to go through a 1,000 line Juniper configuration and get rid of unnecessary communities? I have. I can tell you right now that while it sounds great to craft a masterpiece of scalability before you need it, you MUST RESIST. To remove a single community that you’ve slapped on a prefix as it enters your environment, you need to run maybe dozens of configurations and examine them to see what, if anything, that community is doing. And that’s to delete one community. What if you tagged four, and they’re different at each entry point? The time it takes to add those communities when you think you need them is trivial compared with the effort to remove them when you realize you don’t.

So guess what happens – no one cleans them out. You get a few lines in your configuration that go unused – no big deal. WRONG IN THE FACE. That’s a few lines in each box you did that. A year goes by, you makes a few changes. A few more lines go unused, but they’re too abstract to change. Why is there a policy-routing snippet here that no one has ever used? Can we get rid of it? No one knows, management wonders why you care since they don’t, so you shrug and leave it there. The result is a slopheap of complication waiting to sweep the leg during an outage.

So to reiterate my point since that awesome Karate Kid reference probably just derailed everyone, it takes about 5 minutes to carefully add a new community or an informational route tag; it takes an hour to remove one. By all means build the scaffolding within your greenfield configurations to grow logically, but remember that scaffolding is easy to disassemble, and for all intents and purposes you can see through it. Much of what we consider scalability scaffolding is plain overkill, and would be really easy to add when there’s a demonstable need.

* Notice that of all the reasons people overengineer a solution, I’m picking the most legitimate one. The price for my thoughts on other motivations is one Newcastle.


  1. says

    I know this feeling. I have numerous configs on devices that I wonder ‘what the hell is this doing exactly?’ – I’d like to just remove a lot of it, but as you say I need to be completely sure what this piece of config is doing to routers far away.

    I have managed to annotate my new junos configs, but the old config is still on there waiting to be removed. There are a couple of notes saying ‘to be removed at next maintenance window’

    Alas my Ciscos and Brocade don’t have this feature :(

    • ktokash says

      This is the main reason I filter “bad” sources via ACL vs null route on Ciscos. You can add a remark in the ACL and remove or leave it a year later when auditing based on real information. I’ve seen null routes piled up and no one knows what any of them are there for.

      I hope Juniper adds a “show | display set-with-annotate” sometime. It’d be pretty helpful.

  2. kevg says

    Thanks for that, as a relative newbie it’s always reassuring to know I am not alone. I like your sympathetic reminder about unknown motives; I am *trying* to remember as I pick my way through configurations that however mad something seems there was probably a reason for it … even if the reason was a rushed project. I am starting to view my task as something akin to network archaeology.

Leave a Reply

Your email address will not be published. Required fields are marked *