In the last post in this series, I spent some time talking about the process of detecting a link failure (given down detection is always the more important issue in fast convergence); let’s continue by looking at notification. If a router discovers a down link, or a down neighbor, how does it tell all the other routers about this topology change, so they can adjust their forwarding tables as well? Let’s look at IS-IS and EIGRP as examples.
A link state protocol will flood a new LSP (or LSA, if you’re one of those old-fashioned folks still using OSPF! 🙂 ) to all the routers in the same flooding domain. The key question is: how fast do I flood this new LSP? If you think the answer is, “Always a fast as possible,” you’re forgetting that speed is the primary contributor to instability. Control planes must somehow balance between reacting fast enough to report true changes, but not fast enough to either form a positive feedback loop (and hence fail!), or to report something that’s a short transient condition (thus wasting cycles and bandwidth). How does a link state protocol steer the course between fast and stable?
The key mechanism used in link state protocols to steer between fast and stable is the set of timers around LSP generation. There are two timers here, the first determining how long after a state change to wait before building the new LSP, and the second determining how long to wait after the generation of an LSP before generating a new one. Both of these are subject, in most link state implementations, to exponential backoff timers of some sort. This means the first change causes a new LSP to be generated fast, the second causes an LSP to be generated a little more slowly, etc., until we reach a maximum generation time that provides stability no matter how fast the topology is changing.
When we move to the world of distance vector, we shift gears. The question is no longer how quickly to notify, but how long it takes for the updated information to reach all the impacted routers. The basic time involved here are the amount of time it takes for each router in the path to process the update (so it can calculate what information to send on to its neighbors). When an EIGRP router receives an update, it must process the update, determine a new best path, and send the update to its neighbors. How long will it take to do all of this? Lab testing shows this time to be around 200ms for this processing.
But knowing how long it takes for a single EIGRP router takes to process a single update doesn’t give us the total convergence time. Instead, we need to look at the time it takes for every router in the network that needs to calculate a new path to any impacted destinations to receive and process the update. How many routers is this? It all depends on your network design. EIGRP updates stop when either the end of the network is reached, or at a point where aggregation or filtering stops that particular update. Hence, in EIGRP, the speed of convergence is closely tied to failure domain boundaries as defined through route aggregation and filtering.
But this brings us to EIGRP feasible successors… Why haven’t I said anything about feasible successors? Because that’s a subject for another post discussing calculation — the third step in convergence.