Does TRILL Stand a Chance at Wide Adoption?

TRILL (TRansparent Interconnect of Lots of Links) is considered by some to be the heir-apparent to spanning-tree’s throne. After all, Radia Perlman was the force behind STP, and her name heads the list of authors for RFC 6325, the base TRILL protocol specification. For that reason alone, it seems a natural progression to move from STP into TRILL, but that’s not what we see happening, at least not yet. Instead, we see a hodge-podge of alternate solutions to the same problem, that of maximizing connected swaths of data center while minimizing hops and associated latency, maintaining a loop-free topology, and not wasting any links in the process.

The roster of technologies that obviate (or at least reduce) the need for TRILL is lengthy, with many groups representing vendors and standard bodies offering various approaches. For example…

  • Multichassis etherchannel. Spreading an LACP link across two switches does allow for forwarding on all interswitch links, but it’s not a topological “any-any”. In addition, vendors’ MEC solutions are not interoperable. You can’t take an Arista switch, mate it to a Cisco switch, and present a unified MEC uplink to an adjacent node. Examples of MEC include Cisco’s Nexus virtual port channel (vPC) and Arista’s multichassis link aggregation (MLAG).
  • Shortest path bridging. SPB is the IEEE’s answer to IETF’s TRILL, and is seeing some backing from vendors like Avaya and HP. Similar to TRILL in that it uses a routing protocol at layer 2 to calculate a forwarding path, SPB is also experiencing slow adoption.
  • Intelligent Resilient Framework. Built upon a stacking technology, HP’s IRF allows up to four (last I knew) A12500, A10500, A9500, A7500, A58XX, or A55XX switches to act as a single logical switch.
  • Virtual Switch System. Cisco’s VSS with properly equipped Catalyst 6500s allows a pair of the big Cats to act as a single logical switch.
  • Virtual Chassis. Up to 10 of Juniper’s EX4200 line of switches can be connected in various physical combinations via dedicated links.
  • QFabric. Juniper’s data center beast is a proprietary method of building an any-to-any topology using ToR switches meshed through a central fabric interconnect and managed via an external control plane.
  • OpenFlow. OpenFlow-capable switches can be used to build a complex data center topology centrally managed by a controller that programs forwarding tables as directed by an application. Still developing, OpenFlow is looked at askance by data center designers who observe that OpenFlow is not actually open source in the strictest sense, and might fairly consider OF as a corner case solution. Although Google’s recent revelation of their wide OF deployment was a notable success for the fledgling protocol, that was perhaps the ultimate corner case. OF will likely see more traction as vendors begin building holistic solutions that leverage OF  as a part of an overall networking solution.

While hardly a comprehensive list of the alternate topology technologies there are to choose from today, even this quick look shows that there’s a whole lot of competing protocols out there seeking mind share. And wallet share. So where does that leave TRILL?

Cisco is probably best positioned to drive market adoption of TRILL, and they are certainly a big TRILL player with FabricPath. However, Tasman Drive is currently positioning FabricPath as a play for big data centers, as opposed to the STP-alternative everyone should embrace. FabricPath is a licensed feature of the Nexus product line, and as such isn’t seeing a groundswell of implementations among the significant numbers of small and mid-level shops that are deploying Nexus to gain 10GbE density.

Brocade is also leveraging TRILL, basing much of their VCS Ethernet fabric technology on the specification, but Brocade does not command enough share of the ethernet switching market to drive wide adoption. Notable is that both Cisco and Brocade’s TRILL implementations are pre-standard (read: proprietary). It’s taken so very long for the TRILL standard to settle, that it’s been a moving target for vendors to code into their gear.

Other vendors including Dell, HP, IBM, Extreme, and Huawei have made noises in a TRILL-direction for 2012. “Why so long for other vendors to offer TRILL?” you might wonder. That’s a fair question, and the answer (at least in part) is that TRILL means new hardware. TRILL is an encapsulation technology, so to cram a TRILL frame through the silicon you need…well…different silicon. Broadcom’s BCM56840 series chipset can handle TRILL (as well as SPB), and so vendors deploying merchant-silicon based switches are potentially ready to move ahead in the coming months.

The question still remains. Does TRILL stand a chance against all of these other competing technologies? Time will tell, but from where I sit, the customers are going to need to want it. But until some technologies prove themselves as contenders or fall by the wayside, it’s tough to pick a winner. And no one with the title of “architect” or “director” wants to back a loser.

About Ethan Banks

Ethan Banks, CCIE #20655, is a hands-on networking practitioner who has designed, built and maintained networks for higher education, state government, financial institutions, and technology corporations. Ethan is a host of the Packet Pushers Podcast, which has seen over one million unique downloads, and today reaches a global audience of over ten thousand listeners. Also a writer, Ethan covers network engineering and the networking industry for a variety of IT publications. He is also the editor for the independent community of bloggers at PacketPushers.net. Follow @ecbanks.

  • Rob Turner

    Part of the issue for TRILL, as you touch on, is that none of the so-called TRILL early-adopters are actually implementing TRILL; they are all ‘TRILL-like’ at best.  Cisco’s FabricPath is a hybrid of SPB and TRILL and some proprietary stuff, Brocade’s VCS doesn’t use TRILL’s IS-IS for the control plane, and…  And, well, that’s it.  For all of the talk about, no-one is actually doing it.

    And I can sort of see why; it’s an improvement of sorts on STP and it provides multi-pathing, but that’s really aboutit…  It’s scalability is questionable, it uses different topology trees for unicast versus multicast, and it has no inbuilt service abstraction and orchestration; oh, and you need new to kit to run it…  I can see why those that cannot do anything less have taken bits of TRILL and tried to make it into something that does provide value, tried…

    If you actually do a deep-dive and compare TRILL with SPB, there is no competition for what is the next-generation network topology technology.  That’s why the likes of Avaya, Alcatel-Lucent, Huawei are supporting it, and HP and Enterasys are also talking of doing it – SPB actually does something dramatically new and better than STP, it does so much more than TRILL was designed to be able to do, and is therefore worth investing in.  Oh, and it’s a real live standard (both IEEE and IETF) and running in live networks now…

    So, remind me again why we’re actually excited by TRILL..?

    • http://packetpushers.net/author/ecbanks Ethan Banks

      I don’t have answer there, Rob. I don’t know that we are excited. Interested, at least. I personally am keen on a standardized solution for L2 multipath that sees wide industry adoption so that we can settle into it as a matter of habit and move on. I’d settle for the need for hyperflexible L2 domains to go away, such that we could do some more clever things with L3 that don’t involve overlays. Not that overlays can’t work…but to mind they introduce a layer of complexity to something that is already complex.

      In some ways, it doesn’t matter which approach you look at, because if you dig deep enough, you’ll find some accountant lurking about and filling in ledgers. Do I want “the best” technical solution to win? Surely, whatever that might be, assuming there’s a one-size-fits-most. But we all know that doesn’t always work out that way (although Novell is still in business). It might come down to the best marketing team.

      • Rob Turner

        Mine was essentially a rhetorical question; interested in L2 multi-pathing, sure, but what network is L2 only; certainly not the mainstream.

        And ‘hyperflexible L2 domains’ are not going away any time soon, so that’s why I need a technology that supports VM connectivity, migration, and dynamic adds/moves/changes.  I’d also like to support L3, and Multicast, and multi-site…  That’s why I like SPB; it’s not an overlay, but a single consolidated and extensible technology.

    • Stp

      “SPB actually does something dramatically new and better than STP”
      Like what? Computing 5 trees in order to use 5 different paths between two bridges? I like when you’re bashing STP when SPB is actually as close as STP as it gets in this arena.

      • Rob Turner

        Just check the date…no, it’s not April 1st…  Do you really care how many trees are calculated (at start-up and topology change only) by a highly optimised protocol..?  Isn’t it the end-result – the service – that you’re interested it..?  Today’s boxes are well-endowed with ressources, and the computional requirements of SPB have been well explored.

        STP provides blunt-instrument L2 resiliency (read: loop-free), and that’s it.  SPB, like TRILL, provides highly efficient multi-path loop-free resiliency.  SPB, unlike TRILL, provides service abstraction, delivers scale (up to 16m IDs), is extensible to L3 (i.e. mapping VRFs into I-SIDs); I could go on but the various nuisances are beyond the scope of a blog comment…

        My point is that SPB, as a single consolidated technology, provides – over-and-above loop-free and multi-path – an Enterprise-friendly solution that is both L2 and L3 capable, optimises Multicast support, delivers faster time-to-service, can be orchestration through an SDN model, and is prime for end-to-end/-user-to-application connectivity (do I hear someone shout: ‘scalable, Enterprise-class personalised video conferencing’).

        I don’t think I’m spitting into the wind by ‘bashing’ STP; I’m certainly not unique in taking a position that STP now, in this generation of networking, has more downside than up.  SPB is about the perfect storm of optimisation and simplification; do I care where it evolved from..?

        • Stp

          Well, I’m checking the year… yes it’s 2012 ;-) and the IEEE is still building trees, lots of trees. SPB looks like spanning tree on steroids. With MST, we compute several trees and then map VLANs to those trees. With SPB, the only thing that has changed is that the granularity is finer: we map traffic to a tree on a per flow basis, not per VLAN.

          The impression of multipathing is given by computing massive amounts of trees: source trees in order to ensure “Shortest Path Bridging” multiplied by the number of paths I want to be able to be able to handle in my network (because if I only compute 8 trees for each source and I have somewhere 9 paths in my network: too bad, one path will not be used. )

          And no, big CPU not an excuse for protocol inefficiency.

          TRILL is much closer to routing an breaks away from the spanning tree logic. You have a routing table and next hop is determined locally.

          At last, I would say that users of conventional STP-based networks were not crying for the lack of “shortest path bridging”. They were mainly focused on stability. Here, the routing model wins hands down too. I know that the IEEE considers that there is no need for a TTL because trees have no loops (duh!), but fact is most users are happy that TRILL introduces one.

          You’re convinced that SPB is the greatest thing and that it’s unfair the market is ignoring it? It’s just because nobody in the TRILL camp has bothered challenging you on those kind of blogs. In the end, SPB is not technically superior to TRILL. I would not claim that TRILL is superior to SPB either. SPB is slightly different from TRILL, and this difference will not be enough to tip the balance in its favor.

          • Dave Allan

            There is more than “a few” differences. And as usual nothing is free. If you want ordered delivery, path stability, and meaningful OAM in a bridged overlay environment, there is only so many ways to do it. Most “Ethernet over Foo” routed overlays do not deliver those properties and that includes TRILL. So the IEEE is not stuck, it is simply faithful to its service model.

          • Rob Turner

            As Dave has pointed out, ‘nothing is for
            free’, and any protocol will need to calculate something to define state and populate
            tables.  SPB happens to base this on
            Trees (same name as STP, but different in essense), and it does this so efficiently
            that it’s genuinely a non-issue.  Back in 2007
            (with common, 2007-era hardware) this was modeled and tested.  With 100s of Nodes, 100s of Links, and 100s of
            Service IDs (read: VLANs), convergence times where less than 100 msec.  This isn’t a science project, this is Carrier-grade
            technology, proven in real-world scenarios, and re-purposed for the Enterprise,
            enhanced and extended with L3 and Multicast.

             

            But rather than debate the semantics of how
            SPB achieves multi-pathing, I think the business-centric discussion is the more
            interesting; that of what SPB empowers once the topology is built and providing
            service (i.e. after the first 100 msec). 
            Now we get into the realm of being able to scale to meet the demands of
            mega-connectivity, be that within the Data Centre, or beyond.  We see the power of abstracting the service
            from the constraints of a one-dimensional approach, and now that the technology
            has been opened-up – almost API-like – we can leverage and automate it in
            concert with applications; this is where abstraction delivers synergy, reduced
            time-to-service, improved efficiency, and goes a long way to making the network
            a transparent utility.  Service meets simplification.

      • http://www.avaya.com/usa/portfolios/virtual-enterprise-network-architecture/ Roger Lapuh

         

        The only way to ensure a loop free
        topology for broadcast/multicast and unknown destination packets in an Ethernet
        broadcast domain is to form a tree. Now if we look at SPB, the trees are rooted fix at
        the ingress points of the fabric. If we look at TRILL, the root of the tree is determined
        by an election process. Now tell me what is closer to the  traditional
        Spanning Tree? Isn’t this election process (and what happens if the root bridge
        fails) one of the major concerns about STP/RSTP/MSTP?

        A good attribute of STP/RSTP/MSTP
        is that broadcast, multicast, unknown destinations and unicast packets are
        following the same path. The advantage of this is, that a ping between source and destination tells you
        whether you have full connectivity (same is true for SPB). Now if I look at
        TRILL: bc/mc and unknown-destination packets are following the non-optimal root
        bridge path (non-shortest path)  and unicast is taking a different path.
        This leads to two concerns: a) how can such a network been debugged  if
        not even a ping response ensures connectivity? Let alone the absence of a
        standards based connectivity check.  And b) packet ordering for a packet
        flow is not guaranteed in such a scenario since unknown-destinations and
        unicast packets are following different paths!

        So what has TRILL been designed
        for: To enable the data center virtualization? But why is it then virtualizing
        on VLANs and not a service instance? Didn’t it introduce a whole new header
        forcing a HW upgrade for all customers? The 4k VLANs are not enough for today’s
        VM environments. On top of that, how can I connect multiple data centers
        together which have an overlapping VLAN space? Is putting a translation bridge
        such as the TRILL “cut set bridge” in between two data centers really optimal?
        Wouldn’t it be better to virtualize on a service ID, which IEEE 802.1 introduced
        to Ethernet in 2008 and has been reused in SPB?

  • ktokash

    I’m not terribly excited by TRILL, but the best man doesn’t always win.  I think there’s good reason to believe that whichever standard gains early momentum will become de-facto, and we’ll be stuck with it (or blessed by it) for decades.  It’s going to take more than a half-hearted (quarter-hearted?) Cisco push though, and after being burned so badly by Brocade’s STP that my last two companies turned it off completely (five years apart!) I don’t want to hear about their plans.

    As Greg Ferro likes to point out, we’re pretty cyclical, and this situation smells like the days of FDDI vs token-ring vs ethernet, with some random proprietary stuff floating around to boot.

    Enough yammering, I guess I’m driving at the point that we’re the poor SoBs who are going to have to live with whatever gains traction.  So how can we make sure the one that’s going to irritate us the least in the next 25 years gets that momentum?

    • http://packetpushers.net/author/ecbanks Ethan Banks

      I know for me personally, I’m not ready to push for any sort of an answer, because I don’t know what’s best. I know some specific problems I see that I’d like solved based on my own experience mostly in largish campus networks, but L2 multipathing by itself doesn’t resolve them..or even most of them. In my mind, the overall challenge is way beyond eliminating STP.

      Chatting with other guys who’ve implemented vPC in Cisco Nexus gear (one way to reduce STP reliance), it’s sort of a joke. The additional complexity that’s required to make a vPC domain functional and safe is frustrating, plus I’m hearing tell of significant data center outages when vPC goes awry, particularly in earlier versions of NX-OS. That’s totally unacceptable, but reality apparently.I think the industry could plausibly experience a total reboot of how networks are built – in the form of SDN. Protocols as such are killing us. Protocols are like the never-ending bottle of pills, each one prescribed to remedy the problems introduced by the previous medication. So let’s start over. An all-seeing control plane that abstracts the physical hardware is a conceptual win that perhaps could lay the foundation for getting the network out of the way of building IT engines.

      • ktokash

         I was part of a VPC rollout in 2009-2010 – pair of 7ks, ~dozen 5ks, couple dozen 2ks with a random smattering of other edge switches (to handle 100Mb NICs used for iLo, talk about a PITA).  It was really confusing.  Everyone on the team who wasn’t part of the buildout kind of fumbled through the infrastructure as needed.

        I think it’s natural as a geek to not want to make the effort to “deep dive” on something like vPC, or TRILL for that matter, until it looks like that technology is going to play a reasonably significant role in the future.  There’s just so much out there it’s frustrating to sink 50 hours into getting to know the nuances of a technology or protocol … then never touch it again.

  • http://twitter.com/the_socialist Jon Hudson

    Couple things.

    1.) This is not an us vs. them thing. This is not TRILL vs. SPB to the death. I understand the desire to make it so, but it’s just not. Both will continue to be improved and hopefully see wide use. We are still very early in the game. 

    2.) Protocols and Standards are not things put into stone. They evolve. Before picking on features you see as lacking in TRILL (or SPB for that matter) consider looking at the work being done. 

    http://tools.ietf.org/html/draft-ietf-trill-cmt-00
    http://tools.ietf.org/html/draft-ietf-trill-fine-labeling-00
    http://tools.ietf.org/html/draft-ietf-trill-rbridge-oam-02
    http://tools.ietf.org/id/draft-tissa-trill-oam-req-01.txt

    The above work addresses among other things 4k VLANs, coordinated multicast and of course OAM and includes HP, Huawei and IBM as major contributors, as well as Cisco, Brocade, Intel etc. 

    3.) If the proof of success of a protocol is wide adoption, then I would argue that TRILL is making good progress. 

    Shipping products today from Cisco (TRILL-ish), Brocade (TRILL-ish) and IBM (IETF TRILL). Just between Cisco and Brocade there are well over 1200 production sites. 

    This year you will see TRILL supported on products from Huawei, Dell, IBM, Extreme, and HP. 

    4.) This is a marathon not a sprint. How long did VMware live in the shadows before seeing huge volumes? Even MPLS took years to really take off. For most new technologies it is a 10yr path. 

    5.) On July 9th there is to be a TRILL Plug fest at UNH (the second) http://www.iol.unh.edu/services/testing/bfc/grouptest/TRILL_plugfest.php And hopefully enough info can be gathered so that we start seeing a significant amount of interoperability.  

    6.) Finally one could take the popularity of the recent TRILL packetpushers show 97&98 as at least some evidence that there are a few folks interested in TRILL ;-)