Don’t look now, but you have microloops. How do I know? Because virtually every network with rings larger than three hops, running a link state protocol, will develop a microloop during normal convergence. Okay, so what’s a microloop, and how dangerous is it? Let’s figure this out looking at the (now rather standard) five router ring illustrated below, simply labeled starting at A around to E, running IS-IS.
Assume the (A,B) link fails. Step one in the illustration, at router A, actually consists of two things: a timer kicks off to run SPF, and another timer kicks off to flood the correct fragment of the local LSP with the (A,B) link removed from the topology. Assume both of these timers are set to 100 milliseconds, so at the moment router A starts running SPF, it also floods a new LSP to router E.
It’s going to take some amount of time for the packet containing the LSP to reach router E – it has to be serialized onto the link, queued behind any packet currently being transmitted, then physically carried (at something less than the speed of light, presumably), across the link, and finally clocked into the receive ring at router E, and then pulled off the receive ring through an interrupt into the correct input queue, and then the IS-IS process at router E needs to be scheduled… You get the picture – no matter how fast it all happens, it’s not instantaneous. Let’s set the total time for this new topology information to reach router E to about 100 milliseconds (assume high bandwidth glass and well implemented queuing and process scheduling on both ends).
So at the same time the packet arrives at router E, and kicks off the SPF timer, router A is just starting its SPF run. If it takes 25 milliseconds (a long time in SPF terms) for both router A and router E to run Dijkstra, router A will have completed its SPF run 125 milliseconds before router E does.
Okay, this is all interesting, but why do we care?
What will router A’s best path to B (for instance), be once it runs SPF? (A,E,D,C,B)
What will router E’s best path to B be until it runs SPF? (E,A,B)
See the problem? A is using E as its best path, and E is using A as its best path, during the time differential between the two SPF runs. We have a microloop.
What’s interesting is this microloop will work its way around the ring, starting at (A,E), moving to (E,D), and then finally to (D,C), before all the routers have run SPF, and the entire network is converged.
When will this happen? Any time you have a ring topology – of any size – in a link state network (one of those interactions between network topology and control plane convergence I’ll be talking about at my talk at Interop this year).
How serious is this? Well, to begin with, we’ve been living with this problem ever since Dijkstra wrote the first SPF out on a chalk board (or napkin, as the case might be). We used to have timers in the seconds, not the milliseconds, so seconds long shifting microloops used to be common. But before you say, “not serious, then,” applications also expect a lot more out of networks today than they did even just a year ago. Do you want to be the one being remotely operated on while packets are looping around in the ether? Didn’t think so… High speed trading folks also don’t take kindly to hundreds of milliseconds of jitter floating around in their networks while it converges (assuming the looped packets eventually find their way to their destination).
So what can we do about it? There’s always fast reroute… But let’s not talk about that. 🙂 Another option is something called “ordered FIB.” What does ordered FIB do? Glad you asked…
On another note, for those who are interested, The Art of Network Architecture is manuscript complete, and most of the first round of edits are done, as well. So yes, there is a book in progress, and yes, it is getting closer to being done and published. Patience.