With the ascent of DCI, a new set of requirements emerged which are not fully addressed by current L2VPN technologies like VPLS. There are three major options in deploying VPLS
- LDP based VPLS (RFC 4762)
- LDP based VPLS with BGP Auto discovery
- BGP based VPLS (RFC 4761)
Each option has its pros and cons. Lets look at each option briefly
LDP based VPLS (RFC 4762)
This requires a full mesh of targeted LDP sessions between PEs and manual provisioning. These characteristics don’t allow it to scale in a very large network. H-VPLS tries to alleviate the full mesh problem by partitioning the network in several edge domains and creating a control plane hierarchy. But in the process of doing this it introduces data plane scalability problems on the Hub PEs. This is sometimes referred as MAC explosion and then PBB-VPLS was introduced to addresses this problem.
LDP based VPLS with BGP Auto discovery
This solution just solves the manual provisioning issue with LDP based VPLS at the expense of introducing another control plane protocol i.e. BGP. Signaling is still LDP and full mesh of T-LDP sessions are still required between PEs.
BGP based VPLS (RFC 4761)
This provides auto discovery and reduces signaling overhead by eliminating the need of full mesh between PEs. Applicability of BGP RRs brings scalability to the solution. One disadvantage of BGP based VPLS is wastage of label resource.
All the above VPLS options have serious limitations around redundancy, multicast optimization, provisioning and simplicity. It would be nice to have a L2VPN solution, which provides features like:
- Multi-homing with all active forwarding and load balancing from the CE and towards the CE.
- VPLS currently supports multihoming with single-active redundancy mode only.
- Optimized delivery of multi destination frames
- VPLS can provide P2MP trees but no current solution provides MP2MP trees.
- Simplified provisioning
- An ideal solution should not only provide auto-discovery of PE’s, which are member of same VPN, similar to VPLS (BGP Auto-discovery), but also discovery of multi-homed PE’s and automated DF election.
- Scalability with the use of Route Reflector
- Fast convergence
- An ideal solution should provide MAC independent convergence and the ability to recover from PE-CE and PE failure scenarios.
- Flood Suppression
- An ideal solution should minimize broadcast frames and give flexibility to the operator to choose whether unknown unicast frames are to be dropped or flooded.
EVI: An EVPN instance spanning across the PEs participating in that EVPN
MAC-VRF: A Virtual Routing and Forwarding table for MAC addresses on a PE for an EVI
Ethernet Segment Identifier (ESI): The set of Ethernet links attaching a CE to a PE when the CE is multi-homed to two or more PE’s. Ethernet segments MUST have a unique non-zero identifier, the ‘Ethernet Segment Identifier’.
Ethernet Tag: An Ethernet Tag identifies a particular broadcast domain, e.g., a VLAN. An EVPN instance consists of one or more broadcast domains. Ethernet tag(s) are assigned to the broadcast domains of a given EVPN instance by the provider of that EVPN. Each PE in that EVPN instance performs a mapping between broadcast domain identifier(s) understood by each of its attached CEs and the corresponding Ethernet tag.
P2MP: Point to Multipoint
Single-Active Redundancy Mode: When only a single PE, among a group of PEs attached to an Ethernet segment, is allowed to forward traffic to/from that Ethernet Segment, then the Ethernet segment is defined to be operating in Single-Active redundancy mode.
All-Active Redundancy Mode: When all PEs attached to an Ethernet segment are allowed to forward traffic to/from that Ethernet Segment, then the Ethernet segment is defined to be operating in All-Active redundancy mode.
Ethernet VPN introduces the concept of BGP MAC routing. It uses MP-BGP for learning MAC addresses between provider edges. Learning between the PE and the CE is still done in the data plane. The BGP control plane has the advantage of scalability and flexibility for MAC routing, just as it does for IP routing.
EVPN provides separation between the data plane and the control plane, which allows it to use different encapsulation mechanisms in the data plane while maintaining the same control plane. Within the L2VPN WG there are three major drafts:
|WG Drafts||Control Plane||Data Plane||Early Implementation|
|NVO||BGP||VxLAN , NVGRE||Let me know if you know one?|
IANA has allocated EVPN a new NLRI with an AFI of 25(same as L2VPN) and SAFI of 70.
E-VPN/PBB-EVPN introduces four new BGP Route Types and Communities.
|Type||Route Type||Usage||Applicability||BGP Community|
|0x1||Ethernet Auto-Discovery Route||Mac Mass Withdraw,
Aliasing, Advertising Split Horizon labels
|E-VPN||ESI MPLS Label Extended Community|
|0x2||Mac Advertisement Route||Advertising Mac Address reachability, Advertise IP/MAC bindings||E-VPN and PBB-EVPN||Mac Mobility extended community, Default gateway extended community|
|0x3||Inclusive Multicast Route||Multicast Tunnel End point discovery||E-VPN and PBB-EVPN|
|0x4||Ethernet Segment Route||Redundancy group discovery, DF election||E-VPN and PBB-EVPN||ES-Import extended community|
Let’s take a look at each BGP route type in detail.
1) Ethernet Segment Route
In case of a multi homed CE device, as in Fig.1, a set of Ethernet links comprises an Ethernet segment. A unique Ethernet segment identifier (ESI) number identifies this Ethernet segment, which can be manually configured or automatically derived. When a single homed CE is attached to an Ethernet segment, the ESI value is zero.
A couple of different mechanisms are available to derive the ESI automatically, such as snooping LACP packets or BPDU’s. Once the ESI for an Ethernet segment is assigned for a dual homed CE, it is advertised by the PEs as an Ethernet Segment Route (BGP Route Type 4) with newly introduced ES-Import extended community (=ESI value) along with the other extended communities. All the PEs automatically imports the route if their ESI value matches ESI Import Community. This process is also referred to as auto-discovery and allows PEs connected to the same Ethernet segment to auto discover each other.
In the above figure, PE2 and PE1 have the same ESI value (=ES1); PE2 advertises its ESI value in the Ethernet Segment Route with ES-Import community set to ES1. PE1 and PE3 will receive that route but only PE1 will import this route, since it has a matching ESI value. This ensures PE1 knows that PE2 is connected to the same CE device.
After auto discovery, the Designated Forwarder (DF) election happens for multi homed CE’s. The PE, which assumes the role of DF, is responsible for forwarding BUM frames on a given segment to CE.
The DF election happens by the PEs first building an ordered list of IP addresses of all PE nodes in ascending order. Every PE is then given an ordinal, which indicates its position in the list. Ordinal (I) is derived by I= (V mod N) where V is the Ethernet tag value associated with an EVPN instance and N is the number of PEs. Ordinals are given to each PE and determine which PE will be DF for a given EVPN instance.
Let’s say that PE1 and PE2 originator IP addresses are 220.127.116.11 and 18.104.22.168 respectively.
|Ethernet Tag value for an EVPN instance||Ethernet Tag ID mod 2|
PE1 becomes DF for Ethernet tag 300 and PE2 becomes DF for Ethernet tag 301.
2) Ethernet Auto Discovery Routes
Ethernet Auto-discovery (A-D) routes are type 1 mandatory routes and are used for achieving split horizon, fast convergence and aliasing. Only EVPN uses Type 1 routes, PBB-EVPN uses B-MAC to achieve the same functionality.
Multi-homed PEs advertises an auto discovery route per Ethernet segment with the newly introduced ESI MPLS label extended community. PE’s recognize other PE’s connected to the same Ethernet segment after the type-4 E-S route exchange. All the multi-homed and remote PE routers that are part of the EVI will import the auto discovery route.
The Ethernet A-D Route is not needed when ESI = 0, i.e. when CE is single homed. The ESI label extended community has an eight-bit flag field, which indicates “Single-Active” or “All-Active” redundancy mode.
Let’s take a detailed look at how Split Horizon, Fast convergence and aliasing are achieved using the Ethernet A-D route.
In Fig. 2, if CE1 sends a BUM frame to a non-DF PE, let’s say PE1, then PE1 will forward the traffic to all other PEs in the EVPN instance including the DF PE, PE2 in this example. In this case PE2 must drop the packet and cannot forward it to CE1. This is referred to as Split Horizon.
In order to achieve Split Horizon every BUM frame originated from a non-DF PE is encapsulated with an MPLS label that identifies the Ethernet segment of origin. This label is known as the ESI label.
The ESI label is distributed by all the PEs operating in A-S and A-A mode using the Ethernet A-D route per ES. Ethernet A-D routes are imported by all PEs that are participating in the EVPN instance.
In Fig. 3, when PE1 replicates a BUM frame, it adds the ESI label advertised by PE2. When PE2 sees the ESI label it recognizes that the packet was originated from the same ESI and drops it.
For now, you can ignore the Mcast Label in the diagram. We will discuss about that in detail under the inclusive multicast section.
Since BGP is used for advertising MAC routes, this could result in slow convergence in large-scale environments based on the number of MAC routes that needs to be withdrawn.
To combat this slow convergence, a level of indirection is introduced. In the event of a failure rather than withdrawing individual MAC routes, the Ethernet A-D per ES route is withdrawn and any MAC routes pointing to that Ethernet segment are marked as invalid by the PE. This is very similar to BGP PIC core in IP world.
For instance in Fig 4, PE1-CE1 link failure causes PE1 to withdraw its Ethernet A-D route. PE2 reruns the DF/BDF election and becomes DF if it wasn’t already. PE3 removes PE1 as a valid destination for all its MAC Routes.
In the case of multi-homed CE to multiple PEs running multi-chassis lag between them, it’s possible that only one PE learns the MAC addresses due to the nature of hashing. This means that only the PE learning the MAC will advertise it to remote PEs even though there is more than one PE attached to the same segment. This behavior prevents load balancing to the CE.
In order to overcome this shortcoming, aliasing was introduced. Aliasing allows a PE to signal that it has reachability to a given Ethernet segment for a given EVI even though it hasn’t learnt any MAC address on that given EVI/ES. The Ethernet A-D route used in this case is per EVI, which is different than the Ethernet A-D route per ES.
One prerequisite to using the aliasing label advertised by the Ethernet A-D per EVI is that an Ethernet route per ES route should exist.
3) Inclusive Multicast Route:
When sending BUM frames, PEs can use ingress replication, P2MP or MP2MP (mLDP) LSPs. I am going to focus only on ingress replication and save the other two for later.
Every PE participating in an EVI will advertises its mcast labels during its startup sequence via Inclusive Multicast routes. Inclusive Multicast routes are BGP route type 3. Once a PE has received mcast routes from all the other PEs and a BUM frame arrives, the PE will do ingress replication by attaching the respective PEs mcast label.
In the above diagram, PE2 (label 16001) and PE3 (label 16006) advertise their multicast label to PE1. When PE1 receives a broadcast packet, it adds the mcast label 16001+ the label to reach PE3 and sends the packet to PE3. PE1 also forwards the packet to PE2 by adding the ESI label + label 16006+ label to reach PE2. PE3 receives the packet and sees the mcast label; it treats the packet as a BUM frame. When PE2 receives the packet, it notices the ESI label which was advertised as part of Ethernet A-D route and drops the packet.
Below is an output of a sample inclusive multicast route from Cisco’s PBB EVPN implementation.
Below is a detailed output of an inclusive multicast route and mcast labels advertised by PEs
4) Mac Advertisement Route:
And finally, we have MAC advertisement routes, which are responsible for advertising MAC address reachability via MP-BGP to all other PEs in a given EVPN instance. MAC Advertisement routes are type 2 routes.
In Fig. 9, you can see learning between PE-CE is still in the data plane. Once PE1 learns MAC M1, it advertises it to the other PEs through BGP NLRI using MAC advertising route. BGP MAC advertisement route contains RD, ESI (which could be zero or non-zero value for multi-homed cases), MAC address, MPLS label associated with MAC, and the IP address field, which is optional.
Similar to IP world we have different EVPN label allocation modes.
Per EVI label assignment:
This is similar to Per-VRF label allocation mode in the IP world. A PE advertises single EVPN label for all the MAC addresses in a given EVI instance. Obviously this is the most conservative way of allocating labels, and the tradeoff is similar to Per-VRF label assignment. This method requires an additional lookup on the egress PE.
Per MAC address label assignment:
This is similar to per-prefix label allocation mode in IP. A PE advertises unique EVPN labels for every MAC address. This is the most liberal way of allocating labels and the tradeoff is memory consumption and the possibility of running out label space.
Per <ESI,Ethernet Tag> assignment:
In this case, the PE advertises a unique EVPN label per <esi,ethernet tag>. There is some similarity to per CE label allocation mode in IP. In Per-CE label allocation mode, a unique VPN label is assigned to each BGP next-hop. Similar to Per-CE label allocation, Per <esi,ethernet tag> gives us a middle ground by avoiding an additional lookup while conserving label space at the same time.
EVPN seems to be a promising technology and has created lot of interest in the industry. It’s a next generation L2VPN solution based on BGP MAC routing that alleviates current L2VPN limitations around redundancy, multicast optimization, provisioning and simplicity.