If someone tosses you a hot potato, do you want to hold it a long time? If you like pain maybe the answer is yes – but how many of us like pain? In the same way, hot potatoes are very applicable to the Service Provider environment. When a service provider receives a packet, if the destination is another service provider, they don’t want to keep the traffic in their network long time.
Why? The answer lies in simple economics, including the different types of peering relationships between providers. Before going further into an explanation and design cases for hot, cold and mash potatoes, let’s take a look what are these arrangements. Service providers can be grouped as Tier 1, 2, or 3 depending on their topology, traffic, and the geographically separation of their networks.
If a service provider receives a service and/or connection from a provider at a higher tier, this arrangement is called a transit relationship. A tier 2 SP is upstream the service provider of Tier 3 SP, and tier 2 SP gets their service and/or connection from Tier1 provider. Tier 1 providers have their own transmission facilities, connecting geographically separated regions.
Service providers pay to transit traffic to a service provider at a different tier; tier 2 providers pay tier 1 providers for transit, and tier 3 providers pay tier 2 providers for transit, etc. Along the same tier, or among providers that exchange about an equal amount of traffic, providers create settlement free peering relationships. How does all of this relate to hot potato routing?
Service providers don’t want to keep their customer traffic in their network if they can push it off onto another provider’s network, especially if it’s the destination is reachable through a peering connection for which they don’t pay the other provider, so they will move the traffic quickly out of their network into a peering provider’s network. This is hot potato routing. Hot potato routing aims to bring traffic to the closest exit point from where it enters the network.
I will use the figure-1 to explain some of the concepts in this article. In AS1, there are 2 Route reflectors, which can be in same or different clusters.
Between AS1 and AS2 there are 2 links to exit from the domain. For the traffic behind AS2, AS1 wants to send the traffic the nearest BGP exit point. To accomplish this, the IGP metric to the BGP next hop is used as the tie breaker – in other words, R1 needs to find the closest eBGP speaker which can reach the destination in AS2. If R5 or R6, in AS2, are sending the prefix with a MED attribute set, AS1 should remove the MED for the incoming prefixes to get hot potato routing.
Different vendor BGP implementations may vary for using the MED attribute although latest RFC defines if the MED is not set from the sender AS, then receiving AS handle as minimum possible value. This can remove the inconsistency.
But to get hot potato routing, the network designer needs to move beyond BGP metrics, and work with the internal topology of the provider’s network. Inside an AS, three type of BGP topology can exist. Full mesh, confederation or route reflector. Full mesh iBGP topologies where the MED is ignored will naturally choose the exit point closest to the entrance point.
For route reflector topologies, the closer the RRs to the eBGP speakers along the edge of the network, the more accurately traffic will follow the IGP metrics, so the closer the AS will come to achieving optimal hot potato routing.
Service providers, especially when their BGP topologies gets bigger, implement route reflector or confederations. Route reflectors increase scaling by hiding alternate paths from their clients, which involves a set of tradeoffs. Instead of having every potential exit point from the AS, any given eBGP speaker will now only have the set of exit points the RR sends – the optimal exit points from the RR’s perspective. But this best path may not be the best path to exit from the domain from the internal IBGP device point of view, it is the best path from the route reflector point of view.
At least until now , traditional BGP works like this , there is couple paper out there about path-state vector, the idea is sending the policy information more than one hop away and overcome the BGP slow converge issue. (Here I am comparing the speed with the IGP protocol, not full mesh vs. route reflector in BGP.). But even with that idea, route reflector best path selection and advertisement behavior doesn’t change.
Three different proposals have been put forward which can be used to resolve this problem: BGP add paths, diverse path, and computing the best path from the client’s point of view.Add path and diverse path can be used to send more than one exit point to internal BGP speaker. But with these approach, idea is to send more than one best path which is seen by the RR to the internal IBGP speaker. IBGP speaker holds these path, can be installed into the RIB if multipath is enabled, or can even be programmed into the FIB as an alternate/backup path with the BGP PIC.
The Problem with add and diverse paths is simply carrying the additional paths. The RR clients don’t know all possible paths to exit from the domain, they need to know closest one to achieve hot potato routing, or the most optimal exit point for cold potato routing. Although implementations support sending `”1, 2, …, n,” paths, sending less than and may not give the correct result, sending all possible paths defeat the use of route reflector. Routers has to keep all the states in their at least RIB-IN.
The alternate idea is to select the best path on the RR from the internal IBGP speaker point of view and distribute the best path to each internal speaker based on their topology view. Whenever a BGP route reflector would need to decide what path or paths need to be selected for advertisement to one of its clients, the route reflector would need to virtually position itself in its client IGP network location in order to choose the right set of paths based on the IGP metric to the next hops from the client’s perspective.
This is not the new idea actually, if you read the Russ White’s fast convergence article in the packet pushers blog you will see that one method to find an alternate path is to build the SPF tree using a root other than itself.
This approach uses an angular distance approximation method, using an angular distance from each nodes to another based on angular distance not the IGP metrics. I will also touch this method in the next article.
Cold potato routing is used for the opposite situation – when the provider wants to carry the traffic inside its network for as long as possible, or bring the network as close to the actual recipient as possible. Cold potato generally is used for content delivery network, where the target is to manage the user experience. To achieve cold potato routing, R1 needs to know which exit point is topologically closest to the destination.
Why should we use cold potato maybe you are asking right now. One answer is content delivery networks; another is that service providers prefer to keep traffic that are destined to paying customers within their network. For peering it can vary, sending site the traffic may pay , or can be related with the traffic volume or can be totally based on mutual benefit.
In figure1, if Service Provider B is receiving transit service from the Service Provider A, then it can choose the exit whichever Service Provider B wants.
Service Provider B could send a MED attribute or prepend one advertisement while leaving another advertisement without any prepend to influence inbound traffic. Or it could send a community as a signal to Service Provider A to raise the Local Preference of one path over another. But all these attributes can be easily remove from the BGP update depends on the agreement between the Service Providers.
Google, Facebook, and Akamai bring their cache engines and servers to IXP (Internet exchange point) or directly into the service provider network to avoid hot potato routing to their network. This is actually good for end users, providers, and Google, while reducing the value of the services transit providers well. When their cache engine is closer to the actual users, service providers can better control the traffic and use best exit to reach to content.
Hot and cold potato routing sends the traffic either exit closest to the entry, or the exit closest to the actual destination. Full mesh IBGP and route reflector topologies are discussed, along with fast convergence, and optimal traffic flow.
Route reflectors can be deployed to avoid operational concern, avoid keeping the routing states in each BGP devices. In the article sub optimality, reducing the convergence speed due to hidden paths, to avoid hot potato routing touched while using route reflector instead of full mesh.
Route reflectors also need to be fully mesh to allow prefix distribution inside an AS, and can be created second level hierarchy for the route reflector itself. This and other topics which are stated earlier in the article I am planning to explain in the next article.