Yes, that’s right, we have another new BGP NLRI: BGP-LS. In this post we will be looking at BGP with Link State (LS) extension which is an integral part of the Carrier SDN strategy. We will look at why we need BGP-LS, its internals and its applications. What I won’t cover is things like do we need SDN?, whether all network engineers have to become programmers? Or any other bigger questions threatening humanity.
Problem Space:
Centralized LSP Path Computation
So as you may already know that with traditional RSVP-TE, there are certain problems with distributed computation like bin-packing or optimal path computation for Multi-area/Multi-AS TE. The crux of the issue here is that a Head End router has a limited visibility in his domain (a single AS or Single IGP area), whether it’s the number of LSPs in-flight (related to bin packing) or LSDB for other areas (AS or IGP Area). These problems are hard to solve with distributed computation and it makes sense to move LSP path computation for these kind of problems to a central controller which has visibility to the entire domain or more than one domain which allows it to calculate the paths efficiently, which then can be signaled by the controller to the head-end node about the path which is end to end optimal.
One example of a central controller is PCE [RFC4655] which can be used to compute MPLS-TE paths within a domain or across multiple domains (multi-area or multiple ASes). In previous proposed solutions for multi-domain path computation source routers uses a technique called “loose-hop-expansion”, and selects the exit ABR and other ABRs using IGP shortest path topology. This approach has various disadvantages like calculating sub-optimal paths, makes alternate/backup path computation hard and may result in no TE path behind found when one exists [I will cover this topic in detail in my future blogposts on PCE/PCEP].
Even in the world of Segment Routing, Traffic steering problem will be solved by a centralized path controller by calculating the paths at the controller and then signaling it to the head end node.
But in order for a central controller like PCE to calculate end-to-end optimal paths, it needs a database which has topology info on which it runs its path computation. The database is usually known as Traffic Engineering Database (TED) and in order to build that database, a controller needs details about the topology and resource information of the domain like link bandwidth, available bandwidth, link metric, the TE metric etc.
Fig.1
One way we can solve this problem, i.e. Getting topology and resource info is by making Controller peer passively with an IGP node to get all the link state information. This is how people have solved the problem of getting IGP info in the past, but the problem with this approach is that
- A practical controller code needs to support both OSPF and IS-IS.
- IGP tends to be very chatty, so the controller will spend some time processing all the chatty updates.
- In the cases where the network consists of multiple IGP domains across geographic areas then it could be a challenging on where to place the central entities i.e. A controller which peers with IGP.
Or an alternative approach could be to extend BGP by creating another NLRI which can carry all the IGP info over BGP. In this approach we can leverage an in-network, BGP speaker that is already participating in the IGP, BGP Speaker can retrieve info from IGP LSDBs and distribute it to a controller, either directly or via a peer BGP speaker. The BGP speaker can apply any filters before sending the info northbound to the controller.
Advantages of this approach are:
- Controller implementation has to only support BGP.
- BGP tends to be less chatty compared to IGPs.
- In a network with multiple IGP domains, extending peering over BGP is a lot more feasible compared to IGP.
This is what BGP-LS is all about. Once you have the topology info, you can also write an application to draw the network topology graphs which can be updated dynamically as the topology changes occur like if a node or link went down. This may seem minor, but I think it’s pretty handy.
Fig.2
BGP-LS Internals
Now let’s look dig deep into BGP-LS internals. So as you know that an IGP consists of topology and IP reachability information and if we want to reconstruct an IGP Topology view at the controller based on the data received over BGP-LS then BGP-LS must have some way to represent Topology and IP reachability information in its database. So let’s take a look and see how this is done.
BGP-LS specification contains two parts:
- Definition of a new BGP NLRI type which is essentially sets of TLV’s that defines three objects:
- Nodes
- Links
- IP Prefixes
With the combination of Node and Link objects one can construct a topology info and IP Prefix object will provide IP reachability information.
- Definition of a new BGP path attributes (BGP-LS attribute) which is optional Non-transitive attribute. It encodes the properties of the objects (link, node and prefix). For instance, it could be Node-names, IGP metric, TE-metric, Available BW etc.
1. BGP-LS NLRI: Format of Link-state NLRI is shown below
Fig.3
As we mentioned earlier, there are basically three types of NLRI: Node, Link and Prefix NLRI (Type 3 and 4).
Type 1 Node NLRI: Node NLRI is pretty self-explanatory. It contains Node descriptor and Node attributes. Typically, Node descriptor will be the Router-ID and is carried under the value field of the Local Node Descriptor.
Type 2 Link NLRI: Link NLRI represents a link in the network. Where local and remote Nodes are the two endpoints and the Link descriptor is the link between local and remote nodes. The link description field is a set of TLVs uniquely identifying a unidirectional connection between a pair of adjacent nodes.
Fig.5
A link described by the Link descriptor TLVs actually is a “half-link”, a unidirectional representation of a logical link. In order to fully describe a single logical link, two originating routers advertise a half-link each, i.e. two link NLRIs are advertised for a given point-to-point link.
Type 3 and 4 Prefix NLRI: Type 3 and 4 uses the same format and contains Prefix descriptor and attributes. The ‘Prefix Descriptor’ field is a set of Type/Length/Value (TLV) triplets.‘Prefix Descriptor’ TLVs uniquely identifies an IPv4 or IPv6 Prefix originated by a Node.
Fig.6
2.BGP-LS Path Attributes:
Path attributes are used to carry necessary attributes to characterize the objects described above i.e. Node, Link or Prefix NLRI. It’s again in TLV format and will include things like Node Name, Maximum Link Bandwidth, IGP Metric, Unreserved bandwidth, SRLG etc.
- Node Attribute: Node attributes TLVs may be included in the BGP-LS attribute accompanying a node NLRI. This may include things like Node Name, Router-ID, Multi-Topology identifier etc.
- Link Attributes: They are presented together with the corresponding link NLRI describing the link. Link attributes can be sourced from any of the extensions for the IGP routing protocols (IS-IS/OSPF). Link Attributes are like Metric, Max Bandwidth, unreserved Bandwidth, available Bandwidth etc.
BGP-LS Demo
At this point we have some idea of what consists of BGP-LS at a high level. Let’s look at a sample topology, which may bring more clarity. In my topology (Fig.7), I have four IOS-XR routers running the software version which has support for BGP-LS. Juniper supports BGP-LS as well, but at the time of writing, I didn’t have the access so I was stuck with using IOS-XR only in the topology. All the routers are running IS-IS Level-2 as IGP on the point-to-point links.
Router XRv1 is peering over BGP-LS with Open Daylight Controller (ODL). End goal here is that XRv1 will send all the IGP topology info over BGP-LS to the ODL. Once ODL has that information, we can use that information for LSP path computation or creating an application to visualize the topology. In this post, we will be looking at the sample app to draw the topology by grabbing the info from ODL over RestConf.
Fig. 7
BGP-LS Config on XRv1
Below is the highlighted relevant config under IS-IS and BGP to configure BGP-LS on Router XRv1 which is peering with ODL. Note that I am only redistributing IS-IS Level-2 information into BGP-LS.
Fig.8
Fig.9
After configuring the above lines of config, BGP-LS neighbor relationship is up with ODL. Now let’s take a look on whether BGP-LS database has everything present to correctly derive our IGP topology.
If you look at the below fig.10 which is a logical representation of our topology of Fig.7, we can infer few things like
Fig.10
- Number of Nodes: We have 4 IGP Node’s. So I should expect 4 Node descriptors (Type 1) in BGP-LS database to correctly describe the number of nodes. [Needed for creating Topology]
- Number of Links: We have 3 Point-to-Point links so we should have 6 (3×2) Link descriptors (Type 2) to correctly describe all the links. If you recall each link descriptor represents a half-link that’s why we need to have 6 not 3 link descriptors. We should also have additional info about the links like the Bandwidth available, TE-metric, IGP metric etc. Encoded as BGP-LS Path attributes. [Needed for creating Topology]
- Number of Prefixes: There are total 10 Prefixes in our topology. So we should see 10 Prefix descriptor (Type 3 for IPv4) advertised by respective Node’s. [Needed for IP Prefix reachability]
Below is an output from BGP-LS database on XRv1:
Fig.11 [V] denotes Node Descriptor, [E] denotes Link Descriptor and [T] denotes Prefix descriptor.
As you can see in the above Fig.11, we have exactly the same number of routes as we were expecting describing Node, link and IP Prefix information from which you can derive the IGP topology.
Let’s take a look at the following descriptors in detail to get some more clarity:
- Node Route descriptor for XRv1
- Two link descriptor(half-link) Routes for the link between XRv1 and XRv2
- Two IP Prefix routes for node XRv1(advertised by node XRv1)
So below is a detailed output of a Node descriptor for XRv1:
Fig.12
Let’s break the above route and see what information we can infer:
- [V]: Tells me that it’s a Node Route descriptor.
- [L2]: Tells me that it is a IS-IS Level 2 node.
- [s0000.0000.0001.00]]: tells me that the Node ISO-System id is 0000.0000.0001.00
- Node-Name: XRv1 Tells me that host-name is XRv1
- IS-IS Area: 49 Ideally It should be 49.0000. I still have to investigate why it only shows 49.But nevertheless you can combine IS-IS Area and ISO-Sytem-ID to derive that the area is 49.0000
- Router-ID is: 1.1.1.1
- And the obvious thing that we are running IS-IS as IGP
Similar information can be extracted from the other three Node Route descriptors.
Below is a detailed output of two Link descriptor routes for the link between XRv1 and XRv2.
Fig.13
Let’s break the above link-route and see what information we can derive:
- [E]: This tells me it’s a Link Descriptor Route.
- [L2]: This tells me that IS-IS link type is L2 (adjacency type is only L2 as there are no other routes present with L1)
- [L[i172.16.2.10][n172.16.2.2]] : This part tells that the local Link is 172.16.2.10 and the neighbor link IP is 172.16.2.2
- [s0000.0000.0001.00]][R[c1][b1.1.1.1][s0000.0000.0002.00]] : This portion describes the half-link between ISO-Node 0000.0000.0001.00(local-Node) and 0000.0000.0002.00(remote Node).
- If you look at above two points and combine it with Node info, you can derive that it describes a half-link information between XRv1(0000.0000.0001.00) → XRv2((0000.0000.0002.00)) with Source Interface 172.16.2.10 and remote neighbor as 172.26.2.2
- Local TE Router-ID: 1.1.1.1, Remote TE Router-ID: 2.2.2.2 tells me that local and remote TE Router id is 1.1.1.1 and 2.2.2.2 respectively.
- TE-default-metric: 10, metric: 10: tells TE and IGP metric is 10 from XRv1 → XRv2
- It also has details like SRLG (Admin-Group) information and things like Available and Max Reserved BW.
In a similar way the bottom route gives the detail about the other half-link from XRv2 → XRv1. Combining both top and bottom half-links, one can derive that there is a bi-directional connectivity between XRv1 and XRv2 and its properties like IGP Metric, TE attributes etc.
Combing both Node and Link attributes one can infer Topology information. What’s missing is the reachability information which can be derived from the IP Reachability attribute.
Below is the IP reachability information for node XRv1 from Prefix descriptor routes :
Fig.14
By looking at the above route we can derive following information.
- [T]: tells me that its a Prefix Descriptor Route.
- [P [p1.1.1.1/32]], [s0000.0000.0001.00] and “Link-state: Metric: 10” in the route tells me that Node with ISO-ID s0000.0000.0001.00 is advertising 1.1.1.1/32 with IGP metric 10
- [P[p172.16.2.0/24]] , [s0000.0000.0001.00] and “Link-state: Metric: 10” in the route tells me that Node with ISO-ID s0000.0000.0001.00 is advertising 172.16.2.0/24 with IGP metric 10
In a similar way you can derive the remaining prefixes advertised by other nodes from remaining IP Prefix descriptor routes. By this time, I am hoping this is all making sense to you.
BGP-LS Applications
Alright, so we looked so far that how the Topology and IP reachability details are exported from IS-IS to BGP-LS database. Since ODL is learning this info over BGP-LS, this information is available for our consumption. Now, as I mentioned earlier, at this point one can use this information for LSP path computation or write an application on top of ODL to draw the IGP topology. Some Vendors are already doing both, as that’s their value proposition on top of ODL and customers can do that as well if they have in-house development.
Just as a proof of concept, I wrote a quick script https://github.com/Dipsingh/BGP-LS-Topology-Grapher/blob/master/BGP-LS%20Topology%20Grapher to parse the BGP-LS info received over restconf from ODL and draw the IGP topology.
Fig.15
As you can see the graph has four nodes and the links with their IGP Cost. I am using ISO-System-ID as Node labels.
[ I know that the above graph looks ugly and probably so is my script :). Right now I am learning python and hoping to improve my skills over the years to write better code, so all the python experts looking at my code please bear with me for now]
Conclusion:
In this post we saw how BGP-LS can facilitate in gathering IGP topology of the network and exporting it to a central SDN Controller. I hope this post has been informative. In the future post we will look at PCEP and PCE in detail.
References:
https://tools.ietf.org/html/draft-ietf-idr-ls-distribution-11
Very nice post, Diptanshu!
Looking forward to more posts from you. Are you seeing BGP-LS deployed in the networks you are involved in?
Nice script, I’ll have to pick up some Python soon as well.
Thanks Daniel. In an experimental mode yes, I know few networks which are working on so-called SDN thingy and BGP-LS is a component of that.
Nice work and we have similar bios. love networking and it is a great hobby. More knobs to turn and play with. Any protocol with TLV capabilities is expandable. Curious if Open EIGRP can be used too.
Learning Python myself but for other reasons including SDN. Not sure if network engineers should write restful like scripts for even the simplest things considering one would be done with it already with CLI, macro recorder or the NMS should be evolved to handle it anyway.
Jeff, thanks. If I understood your question right, you mean open EIGRP to export IGP (OSPF/IS-IS) information similar to what BGP-LS ?
Thank you Diptanshu, amazing! How else could anybody write a better BGP-LS primer??
The graph is not only perfectly informative but also very pretty. Please don’t say it’s ugly:-)
Thanks Tamihiro for the kind words… I have seen way prettier graphs and that’s why I know it’s not pretty 🙂
That was one good tutorial. I was also looking into bgp lS and your post is so informative.
If you have, can you share the application/XML format of the information that odl displays of the topology you created to my email.
Also if you have taken update message wireshark logs, that would be useful in my research
Cheers,
Sriram
Hey Sriram,
Thanks and i sent you the XML over email, but unfortunately I don’t have the wiresharks logs
Hi,
Very good post! (as usual)
BGP-LS is at heart of most T-SDN (routed) solutions done with ODL, Ericsson included.
Important part – we use BGP-LS not only to export basic topology with nodes and links attributes but also Segment Routing attributes and SID’s, as described in draft-gredler-idr-bgp-ls-segment-routing-extension, unfortunately expired, we will update it in a couple of days.
We also use BGP-LS to export some TE LSP Objects, described in draft-ietf-idr-te-lsp-distribution, as well as smaller pieces of info we need to expose. One of them – relevant to Segment Routing would be maximal label stack depth supported by a node/a link on a node, I described it in draft-tantsura-bgp-ls-segment-routing-msd – in this case it would be used by a PCE as a constrain, obviously it doesn’t make sense to compute a LSP which would require head-end to push more labels than it can push due to HW limitations.
P.S. I still owe (promised )the post on total solution with BGP-LS, SR and PCEP (including various new extensions to PCEP to do some cool stuff :))
Will do some time in September
Thanks!
Hey Jeff,
Thanks for dropping by and for the kind words.
Thanks for highlighting the use cases where it involves extracting more than just IGP attributes. At some point, I do want to pick your thoughts on GMPLS/PCE and its traction in the Transport World.
I am really looking forward for your posts, and see what Ericsson has done in this area.
Thanks
Dip
Amazing post as usual!
cool, thanks.
That’s an awesome explanation of BGP-LS with ODL.
Is this IOSXR image available outside, like how we use one for GNS3 IOS ?, though of having some hands on , on this BGP-LS feature, since we are moving forward to support BGP-LS in our ONOS controller, it would be really helpful for us.
Looking forward for your reply.
thanks
Antony
Hey Antony,
I am not sure, but see if you can download an XRv image (IOS-XR Virtual Image) which can run on VMware. Another option could be VIRL but for that you have to pay for the subscription.
Thanks
Dip
Thanks for the information dip.
Cheers,
Antony
Thanks Diptanshu,
Thats really good explananation of why we need BGP-LS, What it is and how it works. I was looking for something like this and find it very useful.
Great post, can BGP-LS get the jitter, delay, packet loss?
I believe BGP-LS Only “exports” Information which is encoded in link-state database or TE Database , maybe in future.
We are working on it, there’s more stuff to come, look in idr wg for *bgp-ls* drafts
Thanks, It was a really helpful and quite descriptive also.
The best explaination about BGP-LS; Do you have your own blogs? pls send the URL to my email? Thanks!
Martin,
Thanks for the kind words. For now i am writing only at packetpushers and you can check my all the posts here https://packetpushers.net/author/diptanshu-singh/ . At some point i may create my own blog as well but i havent reached there yet.
Again thanks for the feedback.
Dip
short sweet simple and very informative
Hi,
how can i distribute bgp ls into bgp+lu+sr domain? how can i get IGP +SR from mpls core into DC which runs bgp+lu+sr?
Take a look at these presentations and they may answer your questions and let me know if you have any further questions
http://www.sanog.org/resources/sanog26/SANOG26_Conf-Segment_Routing-Mohan_Microsoft.pdf
https://www.slideshare.net/mobile/DmitryAfanasiev1/yandex-nag201320131031
in both resources, i am not really seeing how IGP +SR label from mpls core into DC which runs bgp+lu+sr? if core IGP + SR labels are not distributed into DC i am not seeing how one can mpls TE from the DC all the way to the core? . in the mohan SANOG26 page 23, there is a label stack but the way the path is constructed is not showed. I can clearly see core SR label getting into the ToR, the way it gets in the ToR is not explained.
>>in both resources, i am not really seeing how IGP +SR label from mpls core into DC which runs >>bgp+lu+sr? if core IGP + SR labels are not distributed into DC i am not seeing how one can mpls TE >>from the DC all the way to the core? .
[Dip] You don’t have to redistribute IGP+SR Labels into DC.. This will be bad and no one wants to redistribute core IGP into DC and Vice versa bcz as you know it won’t scale 🙂 .. One of the ways you can connect both DC and Core is via Binding SID and use a PCE to give an End to End Path.. (PCE job is to use Binding SID while constructing the path and giving it to the PCC)..
>>In the mohan SANOG26 page 23, there is a label stuck, but the way the path is constructed is not >>shown. I can clearly see core SR label getting into the ToR, the way it gets in the ToR is not >>explained.
[Dip] That’s where a PCE come into play and the TOR can either request the label stack from the PCE or PCE can just push the label stack to the TOR.. (Push vs Pull)..
have u ever posted the below
reference made to https://packetpushers.net/yet-another-blog-segment-routing-part3-sr-te/
have u ever posted the below
“In the future posts we will look at the BGP based SR-TE, BGP EPE and BGP Prefix SID (For DC use case)”
anyway it is very very much clear and my mind almost there, only 2 questions remaining .
1. the core Binding SID is advertised to PCE via BGP SR or BGP LS
2. i understand that ToR will construct path from PCE PCC communication, at the ToR what will be the tunnel destination ? most likely the final destination
thanks
>>have u ever posted the below
>>“In the future posts we will look at the BGP based SR-TE, BGP EPE and BGP Prefix SID (For DC use >>case)”
No, I haven’t yet.. Didn’t get a chance to do it..
>>1. the core Binding SID is advertised to PCE via BGP SR or BGP LS
Can be done via PCEP (PCC Report Message). Have to check if BGP-LS can do that or not ..
>>2. i understand that ToR will construct path from PCE PCC communication, at the ToR what will be >>the tunnel destination ? most likely the final destination
yes it will be the final destination as the destination for tunnel.
do you have an implementation where DC SR-TE all the way trough the core ?
I don’t have it yet, I will keep you posted..
i really need to see a case /post of E2E TE from DC to/trough core. thanks
Hope my explanation helped, we might also expand your use case here, so others could benefit .
yes jeff, it has been helpful. from a data plane perspective i have clear view on ToR builds E2E toward the core. just need to see from a control plane perspective how core Binding SID is advertised to PCE .
anyway this whole discussion is enriching , we a use case would be very much welcome.
Hi, thanks for the explanation. 1 doubt: if a link has both v4 and v6 address, then the link descriptor will have both v4 & v6 details? Also a link having only v4, only v6 and both all represents the same physical link right? So how to make it equivalent if some of them are only present?
Hey Salih,
For both v4 and v6 there are separate TLV’s. If both v4 and v6 addresses are present then
both TLV’s will be present, otherwise whatever the interface is configured with (v4 or v6)
Below is the relevant reference from RFC. Let me know if this doesnt answers your question.
https://tools.ietf.org/html/rfc7752#section-3.2.2
259 | IPv4 interface | 22/6 | [RFC5305]/3.2 |
| | address | | |
| 260 | IPv4 neighbor | 22/8 | [RFC5305]/3.3 |
| | address | | |
| 261 | IPv6 interface | 22/12 | [RFC6119]/4.2 |
| | address | | |
| 262 | IPv6 neighbor | 22/13 | [RFC6119]/4.3
Thanks
Dip
Hi Diptanshu,
Would you know if there is a way using BGP-LS to tunnel all the ISIS TLVs from LSPs? I’m specifically interested to get ISIS TLV 10 (Crypto Authentication) and TLV 13 (Purge Originator Identification) and using GoBGP on the controller side to analyze them.
Thanks,
Hey Philippe,
I skimmed through the BGP-LS drafts/rfc and it doesn’t seem like it’s supported. Even if it would have been then the next level of detail would be on whether it’s implemented on Vendor’s BGP-LS implementation. AFAIK, GoBGP main branch doesn’t support BGP-LS either, that’s why I had to write my own BGP-LS implementation for GoBGP.
What’s the use case?
Hi Diptanshu,
Nice post. I am implementing the same but I have used OSPF instead of IS-IS. I only see four nodes (using show bgp link-state link-state) and there is no prefix and links? Any suggestions?
Thanks in advance.
Hi Diptanshu,
Thanks for the post! you saved my time!! I am clear with the communication between pce and pcc, but can yu explain me how pce queries TED for computing the paths ,how a path is computed ?
Thanks in advance!
Hi,
Thanks a lot for your informative blog. I have clarification with respect to broadcast links. Will the Node/Link/Prefix atttributes change massively?
thank you