In this post we will be exploring different aspects of Traffic Engineering (RSVP-TE) from a design perspective using fictional ISP as a reference. The intent of the post is to not necessarily recommend a particular solution, but to bring up different aspects involved in the design. I am assuming that the reader already has somewhat knowledge of MPLS-TE.
Setting the Stage
ACME is a communications company providing communications and data services to residential, business, governmental and wholesale customers. It has around 200 POPs and 50 core locations. Each Core location consists of 2 Core Routers (CR) and each POP( connected to a core location) consists of 4 Access Router (AR) bringing the total number of routers to 900(100+800). ACME uses single Level IS-IS (Level2) for the IGP. ACME backbone mostly consists of 10G, OC-192 circuits accepts few locations with 2.5G OC-48 connectivity. ACME serves following services:
- Layer 3 VPN
- Layer 2 VPN
From a Diffserv perspective Traffic can be divided into primarily three classes
- Premium Traffic – VOIP (EF)
- Assured – Business Critical (AF41)
- Best Effort (BE)
Fig. 1 ACME Backbone
ACME guidelines for Network design is that it should be Scalable, Flexible and Resilient. It should also adhere to the SLA which is expressed in terms of Packet Loss, Latency and Jitter.
ACME has extensive tool sets to measure the SLA for different traffic types between there POPs during normal and catastrophic events. ACME also has the appropriate tools for collecting internal (Intra and Inter POP) and external traffic matrix. Both SLA measurement and Traffic matrix collection are very important aspects of a TE deployment, but are not necessarily the focus of this post.
Traditionally ACME has followed a conservative capacity-planning rule that upgrade links when utilization reaches to 50%, ensuring at least twice as much capacity. Following this simple rule has helped them in achieving tight SLAs for delay, jitter, and loss. . However, aggregate overprovisioning of bandwidth is turning out to be an expensive option. For instance, let’s say if ACME has 1 Gbps of VOIP traffic and 5 Gbps of data traffic. Given the 50% rule, they would need twice the sum of VOIP and data traffic loads i.e. 12 G to assure low delay, jitter and loss for VOIP traffic. Also in cases of network failures or Denial of service attacks, all traffic share the same fate and if unforeseen congestion occurs, it affects all classes.
Diffserv TE provides a solution to this problem, it lets ACME the flexibility to have different under or over provisioning ratios (ratio= offered load / available capacity) for each service class. ACME could thus over provision the VOIP class capacity by a factor of two, hence ensuring that the class receives low-delay, low-jitter, and low-loss service, while over provisioning the data class capacity by a lower factor such as 1.2 which is probably more practical and still offers good service. In our case this would result in 8 Gbps (2+6) of total bandwidth. With this approach ACME can delay its backbone upgrade and can still adhere to the SLA’s required for sensitive traffic like VOICE.
In essence, MPLS TE will allow ACME to improve the utilization of all the links due to more control over the path that traffic takes on the network, both under normal and in failure cases. By increasing the average percentage of link utilization, the upgrade of links can be delayed.
Generally an organization like ACME can look at MPLS-TE for various reasons like
- Minimize maximum utilization in normal working case.
- Minimize propagation delay for delay sensitive traffic.
- Avoid situations where certain parts of network are congested and other parts are underutilized. Improving the utilization of existing resources lowers the investment.
- Certain traffic gets priority in the event of a resource crunch like a link or node failures and Fast Re-Route.
Reducing congestion is considered one of the main objectives of TE but the bad news is that doesn’t create capacity (too bad), the key focus is on prolonged congestion problems rather than on short-lived congestion caused by instantaneous bursts. Congestion normally occurs under two circumstances:
- When network resources do not have enough capacity to contain the offered load.
- When traffic streams are inefficiently allocated to available resources, resulting in subsets of network resources becoming over-utilized while other subsets remain underutilized.
The first congestion scenario can be addressed by: Increasing the capacity of the network resources. The second congestion scenario caused by inefficient resource allocation can be resolved through TE.
[ On a side note, I do want to mention that in general having enough bandwidth solves lots of problems as it allows to keep things simple but it’s not always possible. MPLS TE is a great option for many NSPs but it’s also can get very complicated to operate and maintain. That’s one big reason people are so excited about Segment Routing :). ]
TE deployment model
Now we have somewhat set the stage for MPLS TE deployment. Let’s take a look at various ways of TE deployment:
Tactical vs Strategic:
There are two approaches to TE: Tactical and Strategic.
Tactical TE: The objective of tactical TE is to address specific performance problems (such as hot spots) that occur in the network in an improvised and reactive manner. They are selectively deployed in the network areas.
Strategic TE: Strategic TE tackles the congestion problem from a more systematic and proactive standpoint, taking into consideration the immediate and longer-term outcomes of specific policies and actions. Strategic TEs are deployed throughout out the network.
From a complexity perspective Tactical TEs will introduce less complexity compared to Strategic deployments as they are introduced when there is a problem and are removed when the problem is gone.
In the case of strategic deployment all trafﬁc is subjected to trafﬁc engineering within the core; this is a long-term proactive engineering/planning process aimed at cost savings. Such a systematic approach requires that a mesh of TE tunnels is conﬁgured, hence one of the key considerations for a strategic MPLS TE deployment is tunnel scaling; a router incurs control plane processing overhead for each tunnel that it has some responsibility for, either as head-end, midpoint, or tail-end of that tunnel. The main metrics that are considered with respect to TE tunnel scalability are the number of tunnels per head-end and the number of tunnels traversing a tunnel mid-point. There are few options to deploy a Strategic TE mesh, which are:
Outer core mesh :
In considering a full mesh from edge-to-edge across the core, as MPLS TE tunnels are unidirectional, two tunnels are required between each pair of edge routers hence N x (N- 1) tunnels are required in total where N is the number of edge routers or head-ends. The example in below Fig. 2 shows the tunnels that would be required from the edge routers within one POP to form a mesh to the edge routers in other POPs. If TE is required for M classes of trafﬁc each using Diffserv-aware TE then M x N x (N- 1) tunnels would be required.
If we look at ACME network, the number of edge routers is 800 which means we will need 639,200 tunnels and if we want to deploy Diffserv TEs with one LSP for Voice and another LSP for Data then number of tunnels will be doubled i.e. 1,278,400. This is a heck lot of tunnels
Inner core mesh:
Creating a core mesh of tunnels, i.e. From core routers to core routers, can make tunnel scaling independent of the number of edge routers as normally there will be more edge routers than core routers. In below Figure 3, shows the tunnels that would be required from the core routers within one POP to form a mesh to the core routers in other POPs.
If we look at ACME network, the number of Core Routers is 100 so the number of head end tunnels comes to 9,900 with ~99 tunnels per head end and with Diffserv-TE (one LSP for voice and another for Data) it comes around 19,800 i.e. ~198 tunnels per head end. This number is lot more realistic and manageable compared to TEs at the edge routers.
Another way of reducing the number of tunnels required and therefore improving the tunnel scalability is to break the topology up into regions of meshed routers; adjacent tunnel meshes would be connected by routers which are part of both meshes, as shown in below Fig.4, which shows meshes within each of two regions. Although this reduces the number of tunnels required, it may result in less optimal routing and less optimal use of available capacity. For instance, in the below diagram single mesh is broken into three meshes; West Coast, Central US and East coast with Central US mesh connecting both regions.
We will look into some other mechanisms for MPLS TE scaling further down the post.
What is ACME doing?
ACME decided to create a TE mesh between their core routers as it provided them a more manageable number of tunnels.
Case for Diffserv aware TE
A disadvantage of the basic MPLS-TE model is that it is not aware of the different Diffserv classes, operating at an aggregate level across all of them.
In the case of Diffserv aware MPLS-TE, it refines the MPLS-TE model by allowing bandwidth reservations to be carried out on a per-class basis. The results in the ability to give strict QOS guarantees while optimizing use of network resources. The QOS delivered by MPLS Diffserv-TE allows network operators to provide services that require strict performance guarantees, such as Voice and Video over a common core.
We saw earlier that a Diffserv aware model allows ACME to use different over provisioning ratio’s for Voice and Data from a capacity planning perspective. Let’s explore a little bit more on how Diffserv aware TE can help.
Diffserv aware MPLS-TE (DS-TE) is an enhancement to MPLS-TE that introduces the concept of classes (or class types, to be exact). Each participating link advertises the amount of available bandwidth of each class type on that link which is known as Sub pools. Sub pools are portions of global pools or aggregate bandwidth pool. Sub pools can be used for high priority traffic such as voice, video or any other real time applications.
When the CSPF process is initiated for a new TE tunnel, a bandwidth constraint of a particular class type can be defined as one of the criteria to be used for the path selection. The admission control process using RSVP-TE at each hop is performed against the available bandwidth of the particular class type. This capability to fulfill a more restrictive bandwidth constraint would result in achieving higher QOS (in terms of delay, jitter, and packet loss) and better bandwidth guarantee for traffic using the sub pool.
There are two different models that define how the sub-pool bandwidth constraints are applied:
- Maximum allocation model
- Russian doll model
The choice of which bandwidth allocation model to use depends upon the way in which bandwidth allocation and pre-emption will be managed between the tunnels of different classes. It is noted that if traffic engineering is required for only one of the deployed traffic classes, e.g. For EF traffic only, then DS-TE is not required and the standard single bandwidth pool TE is sufficient
Let’s look at an example and see how Diffserv TE can be useful; Assume in Fig.5 is a multi service IP network consisting of Voice and Data. In order to satisfy latency, Jitter and packet loss requirements for Voice, Diffserv per hop behavior is implemented with a policy to give 4G of strict priority (more than 4g traffic will be dropped) reservation for Voice traffic. The cost of each link and link bandwidth in the below network is 10.
Normally in the case of non-Diffserv aware TE, if there were two tunnels from R1àR8 and R2àR8, they would pick the top route (R1/R2 –>R3–>R6–>R7–>R8) because it’s the shortest path with a metric of 40 assuming there is available bandwidth.
Let’s say size of each TE tunnel from R1 and R2 to R8 is 4 Gbps, both TE’s will pick the top route as it’s the shortest path and has sufficient bandwidth available (metric 40, 10Gbps bandwidth available, 8Gbps needed). If the aggregate flows going through each tunnel were composed of 1Gbps of Voice Traffic and 3 Gbps of Data traffic, then we are fine, but if the aggregate flows were composed of 3G of Voice traffic and 1G of Data traffic, which bring voice traffic aggregate to 6Gbps then we have a problem because of PHB LLQ policy applied on the interfaces to not exceed 4 Gbps of Voice traffic, any extra voice traffic will be dropped which means 2 Gbps of Voice traffic in our case.
Now lets take a look at Diffserv aware TE for the same scenario, assuming we are using RDM for Diffserv aware of TE, and we have two tunnels (Data and Voice) from each Router R1 and R2 to R8. A VOIP class sub-pool of 4Gbps is configured, together with a standard aggregate pool of 10Gbps on each link, which matches Diffserv PHB policy for VOIP. Diffserv-TE constraint based routing algorithm will then route Voice tunnels to ensure that 4 Gbps VoIP sub-pool bound is not exceeded on either link. The first 3Gbps VOIP tunnel will be routed via top path (R1-R3-R6-R7-R8) and other VOIP 3Gbps tunnel via bottom path (R2-R3-R4-R5-R7-R8). For data Tunnel there is enough bandwidth (1G each) to be routed via top path (R1-R3-R6-R7-R8).
As you saw, that Diffserv-TE allows separate route computation and admission control for different classes of traffic, which enables the distribution of Voice and Data class load over all available Voice and Data class capacity making optimal use of available capacity. In order to provide these benefits, the configured bandwidth for the sub-pools must align to the queuing resources that are available for traffic-engineered traffic.
What is ACME doing?
In order to meet the SLA’s for Voice traffic, they have decided to implement Diffserv aware TE with one TE for Voice and another for Data. Voice TE tunnels are setup with higher setup and holdup priority compared to Data TE’s in order to ensure that in the event of congestion, Voice TE’s always travel the shortest path.
This means ACME will have two meshes of TE tunnels between their Core routers, One for Voice and another for Data.
Continued in Part 2