In the last blog, I briefly mentioned about PCE and how BGP-LS can be used as one of the ways to collect the topology information. In this blog we will explore more about PCE, the problems it tries to solve and different elements of PCE.
But before we go any deeper, I do want to mention that PCE concept is not new. It just never got much traction in the past, but now it has evolved very quickly to take a central place in various Carrier SDN solutions.
Okay,So let’s take a look at a few problems which it promises to solve.
1.1 Intra-Area MPLS-TE:
Let’s look at few problems faced running RSVP-TE in a single Area.
Bin Packing Problem
The bin packing problem is basically how do we maximize the use of network bandwidth. The order of the tunnel setup matters and since no single router has the full visibility of all the tunnels it can cause bin packing issue. For instance, assume that every link in the below Fig. 1 Is 10G and the IGP cost is 10 except for the Red link which has a 5G link capacity and IGP cost of 20.
At time T0 = TU1 tunnel is signaled with 5G and takes the path R1->R3->R5->R6
At time T1 = TU2 tunnel is signaled with 10g Bandwidth but fails even though there is sufficient bandwidth available.
Bin packing problem is considered an NP-Hard problem (Basically, in layman terms, it’s a very hard problem to solve and a given solution to that hard problem is also very hard to verify if that’s correct or not, but I am digressing as this isn’t a write up about computational difficulty), but what can be done here is that if a central entity controls the tunnel setup (ordering), has global LSP state visibility combined with heuristic analysis can provide significant improvements compared to routers calculating their own LSPs.
One of the characteristics of MPLS TE in accordance to RFC 3209 is that an LSP isn’t torn down in the event of bandwidth increase process failure. This can cause ill effects on the in-profile LSPs. For instance, in a situation where bandwidth demand on one LSP increases, which is sharing links with other LSPs (in-profile) which aren’t aware about the demand increase. Assume that the increase in demand for that LSP cannot be satisfied by any other link then LSP will stay on the same path resulting in effecting other In-profile LSPs as the total combined traffic exceeds the link capacity. In situations like this at least we could move the in-profile LSPs to other paths.
A central entity like PCE can help here if it controls all the tunnels and moving the in-profile LSPs to another path.
So the idea here is that a central entity like PCE can help reserving bandwidth between Point A to B in advance at a specific time and duration. For instance, nailing up a tunnel with certain bandwidth and latency target for scheduled database replication between two data centers. Now one would ask I could just do that by running a script (or by NMS) running at a specific time and nailing up a tunnel between tunnel endpoints. The problem with simply creating a tunnel is that it doesn’t keep track of whether bandwidth is available or not between two endpoints and it could be possible that bandwidth at the midpoint might not be available as its been used by other tunnels. Situations like these PCE can help as it has the visibility of the network and can move certain tunnels around (by signaling other headends) to make sure there is sufficient bandwidth available meeting certain SLA’s.
Smarter Auto-Bandwidth Adjustments
So generally the way auto-bandwidth at the head end node works is by keeping the track of actual bandwidth usage and triggering a recompilation and re-signaling if a certain threshold is crossed. The headend here is responsible for making the decision based the local information it has available. A better way could be to hand-off the computation request to a central entity which has access to additional information like all the current LSPs in the domain, historical trending of bandwidth usage, application requirements etc. Which allows to make a smarter decision.
1.2 Inter-AS/Area MPLS-TE
Inter-AS traffic engineering has always been challenging in the past because of the fact that a head end node doesn’t have end to end visibility. Let’s just first look briefly at Inter-AS signaling and Computation techniques before we look at few Inter-AS challenges.
1.2.1 Signaling Techniques:
This is the most common way vendors have their Inter-AS TE implementation. In this implementation a contiguous LSP is signaled towards the endpoint (in a different Area or AS) using Hop by Hop signaling
In this case different LSP’s are stitched together at the ABR’s/ASBR’s between the endpoints. The stitching is done at the stitching points by mapping 1:1 labels between the domains
In this case an end to end LSP is nested inside an LSP.LSP that nests other LSP is called the container LSP and is advertised as Forwarding adjacency. Here N LSPs is matched into a single LSP unlike LSP stitching where it has 1:1 mapping. This scales better from a transit domain perspective as we are mapping, multiple LSPs to single FA LSP but has some significant disadvantages.
1.2.2 Path Computation Techniques:
Per-Domain Path computation
In Per-Domain path computation, separate path computation is done on each domain. For example, in the case of contiguous LSP + Loose HOP ERO expansion technique, even though the headend is the owner for the contiguous LSP, but since it doesn’t have visibility into other domains, it specifies other ABR/ASBR’s as the loose hops which perform the optimal path computation within their domain. Contiguous LSP + Loose Hop ERO expansion is the most common implementation among vendors.
1.3 Problems with Inter-AS per domain path computation
Non Optimal End to End Path:
Problem with Per-Domain path computation is that even though each owner of the respective domain computes an optimal path within their domain but it still can result in a sub-optimal end to end path computation.
Path Setup Failures:
In a single domain, when an LSP setup fails due to the race condition i.e. The resource becomes unavailable between the time LSP computation is done and signaled for setup, headend is informed about the decision via path-error and IGP flood the updated resource information. Headend then tries to compute a another path with the updated information.
In case of Multi-Domain/AS, a path setup failure situation caused either due to race condition or due to incomplete TED at the headend becomes harder to handle if the LSP setup failure happens in one domain and the Tunnel headend is sitting in another domain. Telling a Tunnel headend that path has failed doesn’t do much good as it doesn’t have the visibility in the other domain. Like in the below fig. Telling R1 (headend) about not having enough resources between R6 and R8. It’s a better to notify R4 (ASBR) about the path failure, let it find a better path in his domain. In cases if it can’t find, then take a step back further and let R2 (ASBR) calculate a path towards Tunnel endpoint.
This phenomena of cranking back computation one step at a time of the failure is called crank back routing.
Though Crank back routing is a better idea (trial and error approach) but it doesn’t guarantee that it can always find a path. The other way to handle path failures is Route Exclusions.
Maintaining path diversity could be a problem in the case of inter-domain TE LSPs. In the below fig., Assuming I am using contiguous LSP with Primary LSP going out through R2 (ASBR) in AS1 domain and R4 (ASBR) routes the tunnel via R5. In that case R1 signaling the secondary tunnel via R3 (different ASBR) isn’t assuring an end to end diverse path.
Few other problems in the case of Inter-Domain TEs are LSP re-optimization, Constraint definitions, FRR etc.
A PCE is not only able to solve above problems, but also issues like Application Aware path computation, LSP predictability, resource defragmentation etc. Central PCE is also the way to do Traffic Engineering in Segment Routing, which makes the decision and pushes the SR labels accordingly at the head end.
So we have looked so far at few Inter and Intra domain problems and build a case for a central entity to solve the problem. Now let’s take a deeper look at PCE and PCEP.
2. PCE Introduction:
A Path Computation Element (PCE) is an element (most likely residing on a server) that specializes in complex path computation on behalf of its Path computation client (PCC). A PCE can be a router or a Server. Historically, AFAIK there was only one major vendor which had a PCE implementation on the router. In this post we would be focusing on PCE on a server.
So here is the official definition of PCE:
“A Path Computation Element (PCE) is an entity (component, application, or network node) that is capable of computing a network path or route based on a network graph and applying computational constraints.”
A Typical PCE Architecture consists of
Traffic Engineering Database (TED):
A PCE needs network resource information like topology, bandwidth, link costs, existing LSPs etc., which is stored in Traffic Engineering Database (TED). This information can be collected via peering with IGPs (OSPF, IS-IS) or BGP-LS. Most of the vendor Implementation of their PCE supports BGP-LS.
Path Computation Element:
PCE is responsible for doing the actual path computation based on the constraints provided and signaling that to the Path Computation Client (PCC). PCE specializes in complex path computation across various domains on behalf of its path computation client (PCC) with enhanced scalability.
Path Computation Client:
A Path Computation Client (PCC) is an element requesting PCE for path computation.
2.1 Types of PCE:
In the case of stateless PCE, it doesn’t have knowledge of previously established LSPs. This severely limits a PCE capability to optimize the network resources.
Stateless PCE provides mechanisms to perform path computations in response to PCC requests. It utilizes only the Traffic Engineering database (TED DB) to do this computation.
In the case of stateful PCE, It keeps tracks of all the previously established LSPs (in LSP DB) and the available resources. Keeping a synchronized database with network state allows stateful PCE to make more optimal path computation decisions. So basically it has LSPDB+TEDB+PCE in comparison to stateless PCE, which has only TEDB+PCE.
Passive Stateful PCE
In the case of a Passive Stateful PCE, PCC (router) is responsible for initiating path setup and retains the control on path updates. PCE receives the path request from the PCC, does the path computation and send it back to the PCC.
Active Stateful PCE
In this case, a PCC allows the LSP to be delegated to PCE or a PCE can initiate an LSP as well. Basically PCE can initiate LSP path setup and hence the term “Active” stateful PCE. Most of the work is being done in this area.
PCE Initiated: In this case an Active stateful PCE initiates an LSP and maintains the responsibility of updating the LSP.
PCC Initiated: In this case a PCC initiates the LSP and may delegate the control later to the Active stateful PCE. This is like you are driving a Tesla and you decide to put it on autopilot and later you may decide to take the control back. Similarly a PCC can hand over the control to PCE and decided later to take it back. This back and forth switching happens with Delegated bit set during PCEP message exchanges.
PCE Protocol (PCEP) is the standard protocol used between PCE and PCC for communication. It’s a simple TCP based protocol (4189 is the default server port). Let’s look at various PCEP messages.
4. Active Stateful PCE with SR
PCE is the brain for doing traffic engineering in the Segment Routing (SR-TE). It is responsible for doing the path computation and then sending appropriate label-stack (comprised of Node and Adjacency labels) to the Head end node. Then the headend node pushes those segments-list labels on the packets.
PCEP was also extended to support SR between PCE and PCC. Essentially few ERO Sub-objects were extended to support Node and Adjacency labels.
5. PCEP Demo
All theory and no practical is no fun right. If you have access to any vendor implementation like Cisco WAE or Juniper North star controller, then you can play with PCE+TE or PCE+SR-TE. A few months back, I didn’t have the access so I used a python PCEP implementation to play with SR-TE and RSVP-TE. I am going to use XRv 5.3.2 which comes with SR-TE support.
So here is a below topology with 4 XRv’s and the shortest path between XRv1 and XRv4 is the direct path, but we will push RSVP-TE ERO or SR labels from the PCE via PCEP for the path via XRv2, XRv3. We will be emulating an Active stateful PCE, I still have to implement a path computation and TED components. So for all practical purposes for now it’s just a PCEP implementation.IS-IS is the IGP and SR and RSVP are configured.
Config on XRv1
mpls traffic-eng interface GigabitEthernet0/0/0/0 ! interface GigabitEthernet0/0/0/1 ! interface GigabitEthernet0/0/0/2 ! interface GigabitEthernet0/0/0/3 ! pce peer source ipv4 172.16.2.10 peer ipv4 172.16.2.1 ! segment-routing stateful-client instantiation ! reoptimize 60 ! auto-tunnel pcc tunnel-id min 1 max 100 ! interface tunnel-te150 ipv4 unnumbered Loopback0 destination 18.104.22.168 pce delegation ! ! interface tunnel-te151 ipv4 unnumbered Loopback0 destination 22.214.171.124 pce delegation !
Basically the notable things in the above config are:
- PCE ip address is “172.16.2.1” and PCC(XRv1) is “172.16.2.10”
- “segment-routing” enables the support for SR-TE
- “stateful-client instantiation “ enables the support for Active Stateful PCE and allows the PCE to initiate the tunnels.
- “auto-tunnel pcc tunnel-id min 1 max 100”This basically tells that the Tunnels created by PCE (through PCE Init) should have tunnel-id’s between 1 -100
- The other two tunnels (te-150 and te-151) are created on the Router but the PCC(router) delegates that to the PCE (via PCEP Update). In our demo we will be pushing strict hop ERO’s to have these tunnels programmed via XRv2 and XRv3.
In the demo, I have a JSON script which contains parameters like strict HOP ERO’s , SR Node Labels LABELS if we want to push SR-TE, Tunnel Name, Tunnel Source and Destination. Once I have a path computation and TED part build up, then I won’t need this file as things will be handled automatically.
5.1) PCC Owned Tunnels
Let’s take a look at the PCC owned tunnel first. At this point you can see both tunnels configured on the XRv1 are down.
RP/0/0/CPU0:XRV1#show mpls traffic-eng tunnels detail
Name: tunnel-te150 Destination: 126.96.36.199 Ifhandle:0x3e80 Signalled-Name: XRV1_t150 Status: Admin: up Oper: down Path: not valid Signalling: Down
G-PID: 0x0800 (derived from egress interface properties) Bandwidth Requested: 0 kbps CT0 Creation Time: Tue Sep 29 15:54:21 2015 (00:02:56 ago) Config Parameters: Bandwidth: 0 kbps (CT0) Priority: 7 7 Affinity: 0x0/0xffff <Output Omitted for brevity>
Name: tunnel-te151 Destination: 188.8.131.52 Ifhandle:0x3f80 Signalled-Name: XRV1_t151 Status: Admin: up Oper: down Path: not valid Signalling: Down
G-PID: 0x0800 (derived from egress interface properties) Bandwidth Requested: 0 kbps CT0 Creation Time: Tue Sep 29 15:54:21 2015 (00:02:56 ago) Config Parameters: Bandwidth: 0 kbps (CT0) Priority: 7 7 Affinity: 0x0/0xffff Metric Type: TE (default) Path Selection: Tiebreaker: Min-fill (default)
<Output Omitted for brevity>
Once we start our PCE_Controller script our pcep session comes up.
RP/0/0/CPU0:XRV1# show mpls traffic-eng pce peer Tue Sep 29 16:01:59.571 UTC Address Precedence State Learned From --------------- ------------ ------------ -------------------- 172.16.2.1 255 Up Static config
As soon as the session is up, script pushes valid ERO’s for both Tunnel 150 and 151 and that brings the tunnels up. We can see the valid ERO’s being pushed by the PCE to the PCC(XRv1). We also see that the tunnels are delegated to the PCE. In case of a real PCE, if it realizes that there is a further need to change the path for whatever reason, it can push the new ERO’s to the PCC as tunnels are owned by PCE as long as delegation is set.
RP/0/0/CPU0:XRV1#show mpls traffic-eng tunnels detail
Name: tunnel-te150 Destination: 184.108.40.206 Ifhandle:0x3e80 Signalled-Name: XRV1_t150 Status: Admin: up Oper: up Path: valid Signalling: connected
path option 10, (verbatim) type explicit (autopcc_te150) (Basis for Setup, path weight 0) <Output Omitted for brevity> Soft Preemption: Disabled PCE Delegation: Symbolic name: XRV1_t150 PCEP ID: 151 Delegated to: 172.16.2.1 Reopt Trigger: PCE request, Reopt Reason: Soft preemption recovery SNMP Index: 39 Binding SID: 24008 History: Tunnel has been up for: 00:00:17 (since Tue Sep 29 16:00:47 UTC 2015) Current LSP: Uptime: 00:00:17 (since Tue Sep 29 16:00:47 UTC 2015) Reopt. LSP: Uptime: 00:00:17 (since Tue Sep 29 16:00:47 UTC 2015) Current LSP Info: Instance: 2, Signaling Area: PCE controlled Uptime: 00:00:17 (since Tue Sep 29 16:00:47 UTC 2015) Outgoing Interface: GigabitEthernet0/0/0/0, Outgoing Label: 24016 Router-IDs: local 220.127.116.11 downstream 18.104.22.168 Soft Preemption: None SRLGs: not collected Path Info: Outgoing: Explicit Route: Strict, 172.16.2.2 Strict, 172.16.3.3 Strict, 172.16.4.4
Record Route: Disabled Tspec: avg rate=0 kbits, burst=1000 bytes, peak rate=0 kbits Session Attributes: Local Prot: Not Set, Node Prot: Not Set, BW Prot: Not Set Soft Preemption Desired: Not Set Resv Info: None Record Route: Disabled Fspec: avg rate=0 kbits, burst=1000 bytes, peak rate=0 kbits Reoptimized LSP Info (Install Timer Remaining 3 Seconds): Instance: 3, Signaling Area: PCE controlled
Outgoing Interface: GigabitEthernet0/0/0/0, Outgoing Label: 24013
Soft Preemption: None SRLGs: not collected Path Info: Outgoing: Explicit Route: Strict, 172.16.2.2 Strict, 172.16.3.3 Strict, 172.16.4.4 <Output Omitted for brevity>
Name: tunnel-te151 Destination: 22.214.171.124 Ifhandle:0x3f80 Signalled-Name: XRV1_t151 Status: Admin: up Oper: up Path: valid Signalling: connected
path option 10, (verbatim) type explicit (autopcc_te151) (Basis for Setup, path weight 0) reoptimization in progress path option 10, (verbatim) type explicit (autopcc_te151) <Output Omitted for brevity> Soft Preemption: Disabled PCE Delegation: Symbolic name: XRV1_t151 PCEP ID: 152 Delegated to: 172.16.2.1 Reopt Trigger: PCE request, Reopt Reason: Soft preemption recovery SNMP Index: 44 Binding SID: 24010 History: Tunnel has been up for: 00:00:17 (since Tue Sep 29 16:00:48 UTC 2015) Current LSP: Uptime: 00:00:17 (since Tue Sep 29 16:00:48 UTC 2015) Reopt. LSP: Uptime: 00:00:17 (since Tue Sep 29 16:00:48 UTC 2015) Current LSP Info: Instance: 2, Signaling Area: PCE controlled Uptime: 00:00:18 (since Tue Sep 29 16:00:47 UTC 2015) Outgoing Interface: GigabitEthernet0/0/0/0, Outgoing Label: 24012 Router-IDs: local 126.96.36.199 downstream 188.8.131.52 Soft Preemption: None SRLGs: not collected Path Info: Outgoing: Explicit Route: Strict, 172.16.2.2 Strict, 172.16.3.3 Strict, 172.16.4.4
Record Route: Disabled Tspec: avg rate=0 kbits, burst=1000 bytes, peak rate=0 kbits <> Reoptimized LSP Info (Install Timer Remaining 3 Seconds): Instance: 3, Signaling Area: PCE controlled
Outgoing Interface: GigabitEthernet0/0/0/0, Outgoing Label: 24014
Soft Preemption: None SRLGs: not collected Path Info: Outgoing: Explicit Route: Strict, 172.16.2.2 Strict, 172.16.3.3 Strict, 172.16.4.4
Record Route: Disabled <Output Omitted for brevity>
5.2) PCE Initiate tunnels
Now let’s take a look at the PCE Initiated tunnels. These tunnels don’t exist on the Router and are Initiated by the PCE. In our config we have reserved 1-100 for those auto-tunnels. In our demo we will be pushing a SR based TE tunnel for PCE Initiate. As you can see the tunnel-id is 33 and is pushed by PCE. The tunnel is pushed by SR-Node labels as ERO’s. These tunnels don’t show up in the running config and if for some reason PCC and PCE connection is broken then they will be eventually removed. Hence, it’s important to have PCE redundancy.
RP/0/0/CPU0:XRV1#show mpls traffic-eng tunnels detail Tue Sep 29 16:24:55.956 UTC
Name: tunnel-te33 Destination: 184.108.40.206 Ifhandle:0x4180 (auto-tunnel pcc) Signalled-Name: XRV1_t1 Status: Admin: up Oper: up Path: valid Signalling: connected
path option 10, (Segment-Routing) type explicit (autopcc_te33) (Basis for Setup) Protected-by PO index: none G-PID: 0x0800 (derived from egress interface properties) Bandwidth Requested: 0 kbps CT0 Creation Time: Tue Sep 29 16:23:42 2015 (00:01:13 ago) <Output Omitted for brevity>
Reoptimization after affinity failure: Enabled SRLG collection: Disabled Auto PCC: Symbolic name: XRV1_t1 PCEP ID: 34 Delegated to: 172.16.2.1 Created by: 172.16.2.1 SNMP Index: 51 Binding SID: 24017 History: Tunnel has been up for: 00:01:13 (since Tue Sep 29 16:23:43 UTC 2015) Current LSP: Uptime: 00:01:13 (since Tue Sep 29 16:23:43 UTC 2015) Current LSP Info: Instance: 2, Signaling Area: PCE controlled Uptime: 00:01:14 (since Tue Sep 29 16:23:42 UTC 2015) Soft Preemption: None SRLGs: not collected Path Info: Segment-Routing Path Info (PCE controlled) Segment0[Node]: 220.127.116.11, Label: 17002 Segment1[Node]: 18.104.22.168, Label: 17003 Segment2[Node]: 22.214.171.124, Label: 17004
Things to watch out for:
Obviously PCE is the new way of doing things. So few things which come to my mind to watch out for are :
• Does the PCE support redundancy?
• If it does, then how do they synchronize the Information between them. And test the failure scenarios thoroughly.
• How does PCE extract the LSP and Topology information? Some vendor PCE implementation may extract both LSP and Topology info through BGP-LS and some may decide to extract LSP info through PCEP (via PCC report messages). Knowing the details help you only get a better understanding of the system and impact if things go south.
• How much control plane stress will be on a PCE in a given network from all the reports (based on frequency as well) from PCC to PCE and how much it can handle (scaling). Obviously mileage would vary based on the network size
There will be lot more other considerations which I haven’t thought about it and we all will learn more as we get more operational experience.
So we looked initially at the problems faced in Inter and Intra-Domain traffic engineering which can be solved by PCE. We then looked at PCE and PCEP details and ended with a PCEP demo. PCE is getting more popular day by day and folks are finding new use cases which can be solved and weren’t possible earlier without PCE. A very basic example is multi-layer path computation where a PCE co-operates with a VNTM to signal wavelengths if there isn’t enough capacity at the MPLS layer.
As always after writing a long article, I feel like I barely scratched the surface.I hope I have given you some idea about a few of the problem space where PCE can help and details about PCE.