Show 44 – The Case for Shortest Path Bridging

Today its a special show focussing SPB – Shortest Path Bridging. The not quite finished, any day now, IEEE standard to offers a successor to Q-in-Q bridging and delivers L2 mulitpath technology. After a fine rant by Greg on a blog post Rant: Why SPB Doesn’t Get Any Attention, some members of the SPN group got in touch by the comments. Lets talk it out.

Show Notes.

  • Shortest Path Bridgin – IEEE Standard 802.1aq
  • PBB provider backbone bridging – SPBM (MAC-in-MAC)
  • SPBV – VLAN – regular ethernet (Q-in-Q)
  • Discussion how VEPA & VEB could readily replace vSwitch in VMware.
  • Some discussion the “openness” of the IEEE (even though they hide all their proceedings and documentation)

Relevant Links

Introduction to SPB – http://en.wikipedia.org/wiki/802.1aq Provider Link State Bridging http://en.wikipedia.org/wiki/Provider_link_state_bridging Avaya Virtual Enterprise Network Architecture video: http://www.avaya.com/usa/VideoPlayerPopup.aspx?CurrentPath=/master-usa/en-us/resource/assets/videos/data_vena.flv Network Virtualization using SPB – Avaya White Paper – http://www.avaya.com/usa/resource/assets/whitepapers/dn4469%20-%20network%20virtual%20using%20spb%20white%20paper.pdf SPB vs Trill – http://www.avaya.com/usa/resource/assets/whitepapers/SPB-TRILL_Compare_Contrast-DN4634.pdf The Great Debate: TRILL vs 802.1aq SPB http://www.nanog.org/meetings/nanog50/presentations/Monday/NANOG50.Talk63.NANOG50_TRILL-SPB-Debate-Roisman.pdf IEEE 802.1aq Shortest Path Bridging http://www.nanog.org/meetings/nanog50/presentations/Sunday/IEEE_8021aqShortest_Path.pdf

Guests

Paul Unbehagen http://alcatel-lucent.com/ facebook: punbehagen Roger Lapuh http://www.avaya.com Peter Ashwood-Smith http://huawei.com

Feedback

Follow the Packet Pushers on Twitter (@packetpushers | Greg @etherealmind | Tom Hollingsworth), and send your queries & comments about the show to [email protected].  We want to hear from you!

Subscribe in iTunes and RSS

You can subscribe to Packet Pushers in iTunes by clicking on the logo here.

Media Player and MP3 Download

You can subscribe to the RSS feed or head over to the Packet Pushers website to download the MP3 file directly from the blog post for that episode.

Greg Ferro
Greg Ferro is a Network Engineer/Architect, mostly focussed on Data Centre, Security Infrastructure, and recently Virtualization. He has over 20 years in IT, in wide range of employers working as a freelance consultant including Finance, Service Providers and Online Companies. He is CCIE#6920 and has a few ideas about the world, but not enough to really count. He is a host on the Packet Pushers Podcast, blogger at EtherealMind.com and on Twitter @etherealmind and Google Plus.
Greg Ferro
Greg Ferro
Greg Ferro
  • Jonathan Hurtt

    Great Show with great Guest… very informational…

  • http://blog.vcider.com Chris Marino

    Greg, another gem here….

    The first part was pretty dense with all those TLAs, for sure. Had to listen to it a couple of times. As it progressed I was asking myself: But what about TRILL? And then, OpenFlow? Glad you got around to covering these with the SPB guys.

    The question that remains, though, is: Even if you can build out these large L2 networks, will anyone do it? Seems to me that all this complexity could be avoided if you just routed whenever necessary. If virtualization is what’s driving this, its not obvious to me that the large virtualized environments will use it either. The ones with the largest need for large L2 networks with virtualization support would be the IaaS providers. These guys seem to be building out their data centers routing to top of rack, or even right down to the vHost.

    Also, the multi-tenant aspect of their infrastructure would seem to further undermine the need for this as well (i.e individual tenants don’t want to share a L2 domain). Maybe I’m overstating it, but maybe they don’t really want big L2 domains.

    To me, it seems that rather than building out a big L2, what you really want is for your L2 to go where you need it.

    • Peter Ashwood-smith

      Chris, glad you enjoyed it. Sorry we got a bit deep at the beginning, I’d be happy to clarify as much as possible here so I’ll take a few of your comments and see if I can explain a bit better.

      “Even if you can build out these large L2 networks, will anyone do it? Seems to me that all this complexity could be avoided if you just routed whenever necessary”

      The key here is the “hot” migration of a virtual machine. If you want to move a virtual machine from one physical server to another, which involves of course copying its data and code, stack etc. and then restarting it on the fly somewhere else .. you can’t change its IP address. The reason you can’t change its IP address is beause there is currently no easy way to tell whatever other devices that are talking to that VM over IP that it has a new IP address and switch to that without loosing packets. The best way to keep the same address is simply to stay within the same subnet and let the L2 arp mechanism simply move it.

      Now if you don’t care about moving a VM while its hot then you don’t need a flexible L2.

      “To me, it seems that rather than building out a big L2, what you really want is for your L2 to go where you need it.”

      Well thats precisely what SPB does!! It creates a logical L2 on top of a physical L2 network. Therefore the logical L2 network only goes exactly where you want it. You could have a 500 node physical SPB network and create some teeny little logical L2 with only three servers and one router port in it and that L2 would only extend to those 4 locations (no matter where they are). So while we call it a big flat L2, its not really accurate. Its a lot of little logical L2’s that can go anywhere sitting on top of a routed Ethernet network. I kind of think of them as little blobs of jello or rubber that can be stretched and shrunk over top of the physical infrastructure.

      Now each of those little logical L2’s is identified by the ISID# (I think of it as the color of the jello or rubber) and you can simply add or remove members to any of those logical L2’s by just telling some new physical or logical port that it too is a member of that ISID (or is no longer a member) and presto, the logical L2 changes shape. Thats why I like the jello analogy .. or rubber for the extend of the subnet.

      If you go a step further and modify something like vSphere to automatically ask for an ISID to be added to some physical port, it can not only move VM’s around but can ask for the physical or logical port attached to the server to join or leave a given subnet. We’ve done this sort of thing and it all works rather nicely.

      Oh, there are lots of other ways of creating the logical L2’s with this kind of jelly behavior. You can run the logical L2 over a physical L2 .. which is what the IEEE 802.1aq/SPB solution does. Or you can run the logical L2 over MPLS which is what VPLS does and is hugely popular in service provider networks, or you can run the logical L2 over something new, which is what TRILL proposes. … or you can simply create L2 tunnels over IP. Each of these different approaches have pros and cons and of course admirers and detractors. We’ve done quite a bit of analysis of this and I’d be happy to discuss it some time.

      So perhaps we should not call SPB a big flat L2 .. its really a lot of very flexible logical L2’s on a common routed Ethernet infrastructure.

      Anyway imagine this, you take a whack load of switches and simply interconnect them however you want with 10/40 or even 100GE links. You configure nothing and the physical network forms. You then decide which ports (logical of physical) you want bridged to which others and tell those ports. .. presto you get the logical L2’s created.

      Is that any clearer or did I muddy the waters? If so I’ll try explaining it a different way.

      Anyway we could probably arrange some kind of remote demonstration via the web if there was interest. I’ve done a few of these for some customers and I’d be happy to see if something could be done for a larger audience. It may be kind of fun to do this in conjunction with our upcomming Interopability tests and show people a multi vendor network at the same time.

      Cheers,

      Peter

      • http://blog.vcider.com Chris Marino

        Thanks Peter, this helps a lot. The use cases you describe are indeed interesting and valuable. However, IMHO, another aspect to getting this deployed is that not only are their multi-vendor issues, this stuff will very likely have to span organizations. I’m sure that certain providers will (legitimately) object to letting in L2 traffic under any circumstances. There, SPB won’t be an option. Trying to address their concerns would simply be recreating more of what already is available in L3.

  • Ahmad

    Great program guys I’m really liking it!!!

    It’s also good to hear from some people in the IEEE. So, some take-home points/questions I got out of this was:

    1. Memberships to a logical LAN is done through the use of a link-state routing protocol. This dynamically does what manually configuring trunks and VLANs used to accomplish.

    2. In the core, switching is done in a way such that the core does not see every possible MAC address hanging off a node. Is this similar to how 2 labels in an MPLS/VPN are used with 1 to reach the PE router and the other label to determine which VRF within that PE to attach the route to?

    3. Does this mean that if a core switch is presented with a particular MAC address it won’t know what to do with it?

  • peter ashwood smith

    “I’m sure that certain providers will (legitimately) object to letting in L2 traffic under any circumstances. There, SPB won’t be an option. Trying to address their concerns would simply be recreating more of what already is available in L3.”

    Yes, you have several excellent options in MPLS today and there is no point recreating them. What you would do is either put MPLS tunnels between two SPB networks and make a bigger SPB network, or you could terminate SPB at the boundary and run the L2 into VPLS and back again, or something like OTV at the boundary could work too .. there are lots of options. In my opinion SPB is just a very nice simple technology but its not something we see taking over the world ;) Where it shines is when you have a lot of switches, want to get any port to any port bridged and want good tools to know what is going on.

  • peter ashwood smith

    “It’s also good to hear from some people in the IEEE. So, some take-home points/questions I got out of this was:”

    Actually I don’t consider myself an “IEEE” guy I work wherever I need to get something new done. In the case of SPB this means IEEE, IETF and of course a LOT of time in the lab and with customers. Most of us are not professional standards people ;) which sometimes slows us down.

    “1. Memberships to a logical LAN is done through the use of a link-state routing protocol. This dynamically does what manually configuring trunks and VLANs used to accomplish.”

    Yup, the link state carries the topology and the membership. Those two pieces of data are fed to a computation that produces forwarding state specific to the membership and which is then delivered to the forwarding engines of the switch. On failure or change simply repeat.

    “2. In the core, switching is done in a way such that the core does not see every possible MAC address hanging off a node. Is this similar to how 2 labels in an MPLS/VPN are used with 1 to reach the PE router and the other label to determine which VRF within that PE to attach the route to?”

    Yup its identical. In MPLS the core routers only work on a label to forward and don’t look at the service label or the service specific addressing. In SPB the core switches only forward on the outer MAC which will be the box mac of one of the SPB devices and never look at the service instance or service specific addressing. The edge SPB devices of course need to encapsulate and deencapsulate/forward based on the service instance and service specific addresses just like an MPLS PE. The difference of course is SPB does this with just ISIS while MPLS requires ISIS or OSPF/LDP and BGP. We can get away with doing this all in ISIS because we are not aiming for the kinds of scale that MPLS is, a few hundred switches is more than enough while MPLS has to work world wide.

    “3. Does this mean that if a core switch is presented with a particular MAC address it won’t know what to do with it?”

    Let me see if I can explain the forwarding a bit better. First of all remember that a core switch is never seeing a customer VID so in my explaination below the VID I am referring to, while identical in size, position etc. to a customer VID is used to isolate protocols and their behaviors in the core SPB network.
    We have referred to this as a B-VID (backbone VID). Now the edges nodes in the SPB network are responsible for encapsulating with the new MAC and that includes a backbone VID.

    Ok .. so now lets look at the core switch behavior. First thing it does is look to see if the VID and DA are in its forwarding table (filtering table). If they are, its because SPB has programmed them and it will simply do the normal operation (unicast/multicast) as directed. If they are not, well we have disabled learning for a VID which is in use by SPB in the core and so it will discard the frame. Therefore a core node will discard a frame who’s DA/VID are not a combination of a core switch box mac and a VID in use by SPB for core switching. There is one additional twist we add, called a reverse path forwarding check. This is simply turning the reverse learning normally done by Ethernet into a discard in the case of a mismatch. What that means is that the SA/VID are looked up in the forwarding table and if they are not found, or if they are found to point to anything other than the interface on which the packet arrived, we throw the frame away. This is how we can avoid loops with SPB without a TTL and using vanilla Ethernet forwarding in the core switches.

    Anyway thats perhaps more information than you asked for but hopefully that helps.

    Regards,

    Peter

  • Ahmad

    No, that’s great information Peter. I guess my issue is not understanding the concepts of ISIDs and VIDs properly which I have to define properly in my head first. I guess the Wiki-page is a great starting-off point.

    Thanks,

    Ahmad

  • Ahmad

    One point I didn’t quite understand from the podcast was how the Q in Q was turning into something it wasn’t ever meant to be. Can you guys elaborate on that?

    What are the scalability/protocol design flaws you guys were alluding to?

    • Paul Unbehagen

      Yes, b/c of the way that Ethernet allows the flooding of unknown and multicast frames in a VLAN and especially a QnQ network, core switches quickly can get overloaded with the total aggregate of MAC addresses in the network. Not to mention the complexity of managing numerous levels of SVLANs and CVLANs. Which ultimately lead you to investigating different STP options like MSTP and RSTP.

      My statement that ” Q in Q was turning into something it wasn’t ever meant to be” was around the use of a VLAN becoming a VPN id. this has caused many problems with security, scale and management of networks. So the IEEE created 802.1ah that created the ISID as a true VPN id on a ethernet network. this had some nice ramifications… it made ethernet more secure through abstraction through Mac in Mac encap, gave the admin the ability to build arbitrary topologies with simple end point provisioning (you only config the isid at the points of connection, not on every link through the core), and it let you grow your possible VLAN numbers way beyond 4096, now to 16M.

  • Peter Ashwood-Smith

    “I guess my issue is not understanding the concepts of ISIDs and VIDs properly which I have to define properly in my head first. I guess the Wiki-page is a great starting-off point.”

    Yes Ahmad, its amazing how simple bit fields can be so confusing isn’t it. One thing to keep in mind is that the VID you are familiar with defines not only the route through the network but also the membership while in SPB we separate those concepts which is why there is an ISID for membership and a new VID for routing. Anyway this stuff is my day job so I’ll try to elaborate a bit below and save you and hopefully others some time. Thanks for the Wikipedia compliment, we put a lot of work into it for exactly that reason. We did however risk termination by the sinister IEEE cabal and I’ve been looking over my shoulder constantly ever since ;) ;)

    “how the Q in Q was turning into something it wasn’t ever meant to be. Can you guys elaborate on that?”

    I believe that was Paul’s comment. I’ll ping him to come and elaborate on it here and I’ll skip to your next question.

    “What are the scalability/protocol design flaws you guys were alluding to?” [with Q in Q]

    Well Ahmad, the Vlan is only one component that limits the scalability of an Ethernet network, so by adding another vlan tag (Q in Q) you only address one dimension of the scale problem. Also the vlan tag serves two purposes (picking a route AND isolating a subnet (aka service)). The other problem, not addressed by Q in Q is the mac address learning issue. In a Q in Q network you still have to learn all the macs along the way. So while Q in Q does definitely help, it only partially (eg 1 out of three problems) addresses the scaling.

    On the other hand, what the mac in mac encapsulation used by SPB does is to address all three problems orthogonally and it does this in a very clean fashion as follows:

    1) SPB addresses the address problem by adding another layer of addressing (the core switch addresses) which it never learns but instead computes ;
    2) SPB adds another vid for the purposes of topology isolation/routing in the core ;
    3) SPB adds an end point service instance identifier ISID (subnet).

    Together these three things mean that the core switch only has to work on the outer address 1) and the outer vid 2) while the edge switches can effeciently and quickly figure out what to do based on a nice big 24 bit service identifier 3). You will note that 2 and 3 are important because they separate routing (vid) from membership (ISID). Having a totally separate ISID also allows you to group ‘your’ vlans into the same subnet (or not) anyway you want without the core having to know or care.

    So we feel that Q in Q, while a good step, was incomplete since it only really partially addressed the problem.

    In general, any good encapsulation of L2 must provide for different routing, different addressing and different service membership to properly isolate all 3 dimensions and achieve independence and therefore improved scale. If you look again at MPLS you will see that they do this very cleanly which is why MPLS works so beautifully on large scales.

    Hope that helps.

    Peter

  • Kyle

    So if someone wanted to buy some gear that supports this (Avaya switches) how would they go about it?

    Which switches in their portfolio support it? Is there any idea on cost that is public info?

    • Peter Ashwood-Smith

      Kyle, I pinged Roger and/or Paul to respond and point you to the appropriate literature on their products.

    • Paul Unbehagen

      Hey Kyle,

      You can get info on how to buy from thus website.

      http://www.avaya.com/usa/how-to-buy/

      In full disclosure, I joined Avaya this week.

      Paul

    • Kyle

      Wow, I finally got in contact with someone @ Avaya and damn….This product is NOT aimed at anything but large service providers/enterprises/data centers. I had hoped to see at least something smaller for everyone else in the world, It seems like it would be a good idea to target a broader audience if they actually want this technology to grow and succeed.

      • Peter Ashwood-Smith

        Kyle, the technology is designed to work from 1 node to 1000 nodes irrespective of the target market of the product it resides on. You could put together a very nice little network of 1U 48 port switches running SPBM.  

        I share some of your frustration with marketing departments however as they often see technologies as specific to a market while most of us that use them just see them as ways to agnostically push packets.

        • Anonymous

          Is there any way someone could actually get a simple 1RU switch that actually supports SPBM right now though? As a regular person or a small company….not a huge SP.

          • zobop

            Avaya VSP 4000

  • JCC

    Peter (or whoever knows the answer),

    I got a little lost when you guys mentioned that in SPBV switches do not neccesary need to learn MACs. Can you please describe when switches in SPBV need to learn client’s mac address and when not?

    Thank you!
    JCC

    • Ferro Greg

      My understanding:

      Core switches do not learn MAC addresses, they forward frames according to the ISID that is added to the frame at the edge switches. The L2-ISIS protocol distributes routing information about the MAC/ISID database to other edge switches, bit the core switches only know how to forward on ISID “labels”.

      Hope that’s right. Maybe Peter or Paul can expand on this.

    • Paul Unbehagen

      Ya, we can sometimes explain it more complex than it really is sometimes. The intent is that you won’t need to know all the gory details and just use it, like turning the key of a car, no need to know how fuel gets injected into the engine to turn the transmission etc…

      Put simoly, SPBv will learn on all switches, while SPBm will only learn on the edge switches. In both cases IS-IS shares the knowledge like it does with ip routes making it very easy to operate and troubleshoot.

      Another benefit of SPBm is that it protects all (edge and core) your switches from being being seen by any end station due to the encap. This was an important security feature of the protocol design for many environments, especially in the campus and data center.

      Paul Unbehagen

  • Peter Ashwood-Smith

    JCC I had a re-listen and it appears that the response given about the Q-in-Q mode not having to learn end station macs was incorrect (it sounded like the answer was given for mac in mac by mistake which never learns end station macs in tandem switches). Anyway JCC, you are correct Q-in-Q mode of course does have to learn on transit switches while the mac in mac mode does not have to learn on transit switches. Our apologies for the confusion.

    Greg, not quite ;), A useful thing to understand about 802.1aq is that tandem switches just do normal DA/VID forwarding. The only differences are in the learning: Mac-in-Mac mode does not learn but instead asserts the reverse path (to the encapsulating switch) or discards, while q-in-q learns the reverse path to the end station.

    Unfortunately I really need a white board to explain this properly, however there are a few good pictures on the wikipedia page and in particular the IETF draft has an example network and example forwarding tables for both modes. Suggest looking at pages 12-15 here:

    http://tools.ietf.org/html/draft-ietf-isis-ieee-aq-05

  • Alex Demskie

    Wow, talk about DEEP! Interesting talk – too bad it was way too far advanced for me to completely wrap my head around it. Someday soon I’ll be at that level though.

    Inspiration at it’s best!

    • http://etherealmind.com Greg Ferro

      I hope you do. I’d expect to hear more about SPB in the future as it progresses.

  • Bleesmith

    I ran the test 4 years ago that recovered in 20ms.  Awesome technology!

  • Pingback: Dare To Be Stupid « The Data Center Overlords()

  • http://twitter.com/Zeigertelegraf Einar Aleksejev

    The most interesting podcasts -44 Listened to it 5 times. Pure pleasure! 

  • Pingback: Show 104 – Is SDN A TRILL Killer?()

  • http://einaraleksejev.eu/ Einar Aleksejev

    After almost 2 years still the best show on Packet Pushers and SPB makes the most sense as it’s an evolution of existing IEEE and IETF standards. Thank you to all speakers! SPB is a superior technology!

  • Pingback: Show 136: Avaya – Considerations for Turning your Network into an Ethernet Fabric – Sponsored()

  • Joseph

    I don’t see the link to the wiki the speakers mentioned, the one that Ivan said he gave up on at about page 200. Can someone post that link?