“How fast is fast?” In the “bad old days,” when routing protocols were young, and we still shot NERF guns at one another in TAC, IGRP was a going concern (not EIGRP, IGRP!). IGRP holds the distinction of being the slowest converging routing protocol (with default timers) ever deployed in real networks. How slow is slow? Well, the worst case IGRP convergence is 270 seconds — four and a half minutes!
In today’s world, it would take longer for you boss to give you a pink slip than for IGRP to converge. They have the things preprinted now-a-days, I think, all ready to go for the first mistake (remember, mean time between mistakes is a much more realistic measure of network availability than mean time between failures!). But how fast is fast? And how can we go faster? For the next couple of weeks, I’m going to examine fast convergence in routing protocols, hopefully answer some of these questions.
To really understand routing protocol convergence we have to break it down into its component parts, and then put it all back together to understand how those parts interact, and how fast convergence techniques attack “slow” at each stage in a different way.
The first step in convergence is detection. Routing can’t do much to redirect traffic around a failure until the protocol detects a topology change.
The second step is notification. Every router within the same failure domain must be notified of the topology change, so they can react to it. Now maybe you know why I beat up on people about small failure domains all the time!
The third step is calculation. Every router within the failure domain now needs to calculate a new path to any destinations impacted by the topology change.
The fourth step is installation — the new route that’s been calculated needs to be installed in the local forwarding table (CEF/FIB/whatever your choice of three letter acronym is), so traffic can actually be forwarded along the new path.
Get these four into your head —detection, notification, calculation, installation— and you’ll understand fast convergence better than anyone who doesn’t read Packet Pushers. :-) For this post, let’s just deal with detection. In the case of detection, how fast is fast?
The slowest mechanism you can use to detect a topology change is through the IGP hello process. If you’re relying on EIGRP hellos set to their default timers, you’re going to be waiting 15 seconds. If you’re relying on OSPF’s, you’ll be waiting 40 seconds.
So can’t we just make the timers faster? Yes and no. You can make the hello timer faster, but there’s a real danger in overwhelming the processor, since each hello packet must be built and transmitted per interface, and received and processed per neighbor. So what are our other options in the detection phase?
First, and foremost, remember event driven is faster and more efficient than polling, no matter how the polling is handled. BFD is an option, since it is much more lightweight, but it’s still polling, and hence slower than direct loss of carrier on a directly connected physical interface.
Next time, we’ll talk about the second phase, notification, in a little more detail, and (hopefully) develop the idea of a convergence budget. Unless I interrupt myself with some other train of thought…