First of a big thanks to the guys here at PacketPushers, for letting scratch some words on their wall here on the interwebs. With that I want to talk about a problem we ran into. I will preface this by saying by no means am I the smartest guy in the room when it comes to routing, so some of you may read this and just shake your head. But hopefully I end up learning something while I resolve this problem and some other folks do as well.
Our primary WAN connectivity to our hundreds of clients is MPLS. We do this because we do real-time video interpretation in medical environments (check out LAN’s website if you care to know more), and we need the guaranteed bandwidth, latency and QoS. For most of our sites we simply provide point-to-point video communications, but with the growth of our company we are now adding and expanding video call centers. So for the first time since I joined the company 8 months ago, we are adding a new video call center and an MPLS circuit for both video traffic and internet traffic back through our core network.
The basic connectivity portion of this site turn-up went great. But then we got to the internet portion and it took us a few minutes to track down the issue. Simply put, we use what I consider a pretty basic “float-up routing” methodology, and we were not floating up. What I mean by this is that an edge would default to the MPLS head-end, the MPLS head-end would default to the core, and the core would default to the internet edge as in the following diagram (Goal?). But what we have is the edge defaults to the MPLS head-end, and the MPLS head-end defaults to the provider’s MPLS cloud router as in the diagram (Current).
So with the groundwork laid, we are trying to decipher how bad the situation actually is. How much of our traffic is not accounted for in a BGP or static route that is actually being caught in our default route? So I posted last night on Twitter: “Is there an IOS command that allows me to see how much traffic is passing through a a particular route in the table?” This is why I love Twitter. In no time @santinorizzo posted: “@joshobrien77 Enable CEF accounting: ip cef accounting per-prefix. Then show ip cef detail for all entries, or specify a specific entry.” With that I logged into VPN and started looking.
The most telling commands appear to be:
sh ip cef 0.0.0.0 0.0.0.0 detail which is no different than sh ip cef 0.0.0.0 0.0.0.0
What we found is that we are actually not getting all of our edge networks injected into our BGP table and that a huge portion of our network is falling back on the default route. At this point we have solved the problem; the information provided by the CEF accounting was a huge help in narrowing down the issue. Ultimately, our MPLS carrier was not passing all of our edge client routes through to our MPLS core as they should have been, so the default route was picking up the slack. Now we are watching the sh ip cef 0.0.0.0 0.0.0.0 detail to ensure that our bytes is zero for the next few days before we flip our default route.
In talking to the great Greg Ferro @etherealmind on Twitter about this issue, he also brought up the point that quite possibly we want to end up with no default routes in our network at all. This brought me back to one of my early Packet Pushers podcast appearances in which we discussed just this. I think I might just move our network to this design and I will let you all know how it goes.