This is a continuation from Part 1
At this point we already know that simple LFA doesn’t always provide full coverage and its very topology dependent. Reason is simple i.e.in many cases backup next hop best path goes through the router calculating the backup next hop. This problem can be solved if we can find a router which is more than one hop away from the calculating router, from which traffic will be forwarded to the destination without traversing the failed link and somehow we tunnel the packet to that router. This kind of multi-hop repair paths are more complicated than single hop repair paths as computations are needed to determine if a path exists (to begin with) and then a mechanism to send the packet to that hop.
So let’s look at a POP with a ring topology like below. As you can see in Fig.15, R3 doesn’t meet inequality#1 (3 < 1 + 2) and R3’s best path is through failed link. As we mentioned earlier, if we can find a node from which traffic will be forwarded to the destination without traversing the failed link and send it to that node we can achieve FRR without causing a loop. Okay, so time to introduce some terms:
P-Space: The P-space of a router with respect to a protected link is the set of routers reachable from that specific router using the pre-convergence shortest paths, without any of those paths (including equal cost path splits) transiting that protected link. In below fig. 15 “P-Space”is set of routers that R2 (S) can reach without using the R2 (S) -R1 link which is R3 (P-Space) and R4 (P-Space)nodes.
Extended P-Space: The extended P-space of the protecting router with respect to the protected link is the union of the P-spaces of the neighbors in that set of neighbors with respect to the protected link. In below fig. 16 “Extended P-Space “contains the routers that are R2 (S)’s direct neighbor, i.e. R3 can reach without using the R2 (S) -R1 link which is R4 and R5 node. Point behind Extended P-Space is that it helps in increasing the coverage.
Q-SPACE: Q-space of a router with respect to a protected link is the set of routers from which that specific router that can be reached without any path (including ECMP Splits) transiting that protected link. In below fig. 17, “Q-Space “contains the routers that normally reach R6 (D) without using the R2 (S) -R1 link which is R1, R5 and R4 nodes.
PQ-Node: A router that is in both Extended P-Space and Q-Space is a PQ-node. Any router which is a PQ-node can be a remote LFA candidate, i.e. the candidate router to whom, if R2 (S) can send the packet, it will forward the packet to the destination without traversing through R2 (S) -R1 link. In our case R4 and R5 are the PQ nodes and are considered remote LFA candidates for R2 (S).
Okay, so far we have identified the candidate nodes to which we can send the packets safely without risk of them routing it back to the failed link. Next part of the problem is to send the traffic to those PQ nodes, which can be achieved through some kind of tunnel.In general, there are various ways to tunnel the traffic like IPinIP, GRE, LDP, etc. but the most common form of implementation is LDP tunnels.
In case of IP traffic Protection: If we are protecting IP traffic, then R2 (S) pushes an LDP label on top of IP packets to reach R4 (assuming R2 (S) picket R4 as a Remote LFA node). When R3 receives the packet, it forwards the packet to R4 as a plain IP packet because of normal PHP behavior. When R4 receives the packet destined to R6 (D), it forwards the packet upstream towards R5 node.
In case of protecting LDP traffic: In this case a stack consisting of two LDP labels is used by R2 (S). Outer LDP label X, is the label to reach R4 and inner LDP label Y, is label to reach R6 (D) from R4.Now this begs the question, how does R2 ( S ) know that R4 is using LDP label Y for sending traffic towards R6 (D)?. In order for the protecting node to node know what label a PQ node is used to forward the destination (D), it has to establish Targeted LDP session with a PQ node to get the FEC to label mappings. This brings up the point that “Targeted LDP sessions” should be enabled on the all the nodes for Remote LFA.
So does Remote LFA provide 100% coverage?
So we have looked so far that how Remote LFA’s can increase the coverage compared to Vanilla LFA’s but does it provide 100% coverage? Not necessarily. For instance, in the below fig. 20, if we increase the cost between R6 (D) and R5 then we don’t have any nodes in PQ space.
So how what can be done in situations like this to give 100% coverage? The answer is our plain old RSVP tunnels. Juniper’s implementation tries to find if there is a valid LFA, if there isn’t, then it can create a dynamic RSVP tunnel to the destination (R1) and once the traffic reaches R1 then from there traffic traverses towards the destination.
One of the draft I recently came across https://tools.ietf.org/html/draft-kompella-mpls-rmr-00 tries to tackle the problem of providing 100% coverage in the Ring topologies using RSVP tunnels without creating a burden of creating too many tunnels.
RSVP-TE vs LFA’s
At this point let’s try to summarize various Pros and Cons of RSVP-TE vs LFA’s it’s very topology dependent, so most likely the coverage won’t be 100%.
Pros: RSVP-TE provides guaranteed FRR as it doesn’t have topology independence
Cons: Complicated Compared to LFA’s
Sweet Spot for Deployment: Backbone networks where 100% coverage is required.
- Simple to configure and can be deployed incrementally.
- Scales well compared to full-mesh RSVP deployment model and has less overhead compared RSVP soft refresh states.
Cons: Its very topology dependent, so it’s possible that LFA’s won’t be providing 100% coverage .
Sweet Spot for Deployment: POP’s are the best place for LFA deployments.
So is Microloops still possible with LFA’s?
So we saw earlier that MicroLoops are bad (You may not still care about though). RSVP-TE is not exposed to Microloops as the path is locked before use, but MPLS-LDP is exposed to Microloops as it inherits the path from IGP. IP FRR will reduce or prevent the Microloops close to where the failure happens, but they can still occur upstream to the failure. Let’s take a look at Internet2 Topology Fig.22, which is running IS-IS level 2 (or OSPF doesn’t matter). Traffic from NYC to DEN takes NYC->CHI->STL->KYC-DEN path as shown in the diagram.
Now assume that we have IP FRR enabled in the network and we have a link failure between STL and KYC. STL finds a Remote LFA node, i.e. HSTN in our case and tunnels the traffic towards him. Please recall that ATL isn’t a valid LFA as ATL’s best path goes through STL for DEN (Inequality#1 is false 180 < 60 + 120).
So what other events will occur as a result of KYC to STL link failure? STL will send an LSP update to the other nodes about the failure of STL-KYC. Let’s assume LSP update reaches NYC first, it runs the SPF and programs the new route (Which will be through WASH) and start sending the traffic towards WASH. But WASH FIB is still pointing to NYC for DEN traffic until WASH receives the LSP update, runs the SPF and programs the new route. Till then we will have a MicroLoop between NYC-WASH.
So as you can see that LFA/RLFA doesn’t completely eliminate MicroLoops. Ordered FIB is one solution which can solve this problem and I will leave that research to the reader.
We looked at Microloops and how LFA and Remote LFA tries to mitigate them. We also looked at various topologies and their applicability to LFA’s. In General LFA/rLFA is a great and simple solution for NSP POP’s and RSVP-TE’s are still the best solution for backbone network where 100% coverage is needed.