Assembly Required – Interconnecting 2 Ethernet Chassis Switches

You’ve been tasked with interconnecting two ethernet chassis switches.  There are lots of reasons you might want to do this.  The link you’re building might be between two core switches acting as your main data center routers.  The link could be connecting a core switch and distribution or access layer switch.  Here’s a brain stream of the pros and cons of various approaches I’ve seen in production environments.

(1) A single cable between the switches.  The simplest to implement, but a single point of failure.  If the cable fails or the ethernet link on either end has an issue, you’ve lost the link.  This could also be a performance bottleneck.  Links in between switches tend to carry a lot of traffic.  If you’re switches are gigabit, and the interswitch link is 10G, then there might well be enough bandwidth.  If the link is only 1G, then it is probably undersized.  An oversubscribed, undersized link will result in frame discards, where the switch tries to forward across the link, but was unable to do so before the frame buffer overflowed.  Frame discards imply that packets were not delivered to their destination, which in today’s predominantly TCP networks means that packets will be resent.  Depending on the severity of the oversubscription, the end user experience will be one of application sluggishness, unexpected disconnects, and other general strangeness.  Users might lose response packets (echo replies) when doing ping tests to destination across the link.

(2) Two cables between the switches.  In this case, this is not an aggregated link (etherchannel or port-channel in Cisco lingo), but simply two parallel ethernet links.  This is an improvement over scenario one, in that you’ve eliminated a single point of failure.  However, a loop has been created between the 2 switches.  As such, spanning-tree will put one of the two links into a blocked state.  Therefore, you’ve got link redundancy, but haven’t done anything to address potential oversubscription concerns.

(3) Two or more aggregated cables between the switches, linecard x to linecard x.  Spanning-tree treats an aggregated link like a single interface.  Therefore, bundling multiple physical interfaces together using LACP (802.3ad) or PAgP (Cisco proprietary) is common practice for interswitch links.  It is possible to bundle links explicitly without an aggregation protocol to save you from yourself (“mode on”), but this is a bad idea as it leaves you prone to inadvertent loops during turn-up if wires are crossed.  Crossed wires can happen more often than you might expect, especially if your interswitch link is traversing one or more patch panels.

With an established etherchannel between the switches, the next issue to consider is whether or not there are enough links bundled to adequately handle traffic between the switches.  Customarily, etherchannels are built in even numbers of ports.  Back in the day, even numbers of interfaces was a requirement to form an etherchannel, but typically you can bundle any number you like today.  For purposes of this discussion, let’s say you chose to build the uplink of 8 1Gbps links going from module 1, ports 1-8 on the one switch, to the same on the other side – module 1, ports 1-8.

Remembering that an interswitch link typically carries a lot of traffic and that oversubscription is a concern, it helps to understand HOW your particular chassis switch moves packets around inside.  Here’s the big question – does the line card you’re using for the interswitch link have a big enough connection to the chassis backplane to handle the traffic you’re planning to send its way?  Typically, you’ll have a certain number of Gbps assigned to each line card slot.  How many Gbps will depend on the chassis and the supervisor engine in the chassis.  Once you know the speed of the backplane connection, you also need to understand how the line card will allocate those Gbps to the ports.  Most line card architecture groups ports, and then gives each port group some amount of Gbps to be shared.  Certain ports might be reserved as “non-blocking”.  Each card is different, and you’ll need to dig around Cisco’s site to find out how the card you’re working on allocates bandwidth.  Here’s a few pages I came across:

I’ve said all of this about chassis backplane and bandwidth allocation to line card port groups to make a simple point: interconnecting two switches using a bunch of ports on one module (like 1-8) to a bunch of ports on a matching module in the other chassis is risky.  Admittedly, you might not have a choice.  You’ve got the cards you’ve got.  You’ve got the available ports you’ve got.  But keep in mind – you’re exposed to a total link failure if the linecard fails.  You’re exposed to a total link failure if you’ve uplinked within a specific port group on a linecard, and that port group fails – I’ve seen this happen, where the line card stays up, but a specific port group on the card turns to junk.  You’re exposed if the line card is oversubscribed on the backplane; you won’t lose the link, but you might be discarding frames due to busy pipes inbound that the backplane can’t keep up with.  If you’ve got the linecards and available ports to deal with, there is another option.

(4) Two or more aggregated cables between the switches, spreading each of the physical links across multiple line cards.  In this scenario, you’ve got the benefits of an etherchannel – no spanning-tree blocked ports, and the link stays up as long as at least one physical port is still lit.  But by spreading the etherchannel across multiple linecards in the chassis, you’ve reduced your exposure to a linecard or port group failure, and you’ve also made it less likely you’ll run into backplane oversubscription.  Of course, this all depends on the architecture of the linecards, chassis, and supervisor engine in play.  You need to read up on the specific hardware you’re configuring and come to your own conclusions as far as managing the bandwidth and spreading your risk around.

One final consideration is your etherchannel load-balancing scheme.  Very often, the default etherchannel load-balancing scheme will cause one link to see more traffic than the rest of the links in the bundle.  I have found that, when available as a load-balancing scheme, hashing on source and destination port number offers a fairly even load distribution, but again, this depends on your switching environment.  You need to understand the data flowing across your interswitch link to know what distribution might be the most appropriate…which I suppose is a topic for another article.

Also worth noting is that we focused on making the most of your bandwidth in an oversubscribed world.  We didn’t discuss port characteristics, trunking concerns, VLAN pruning, or QoS schemes, all elements that you need to consider when building interswitch links.

About Ethan Banks

Ethan Banks, CCIE #20655, is a hands-on networking practitioner who has designed, built and maintained networks for higher education, state government, financial institutions, and technology corporations. Ethan is a host of the Packet Pushers Podcast, which has seen over one million unique downloads, and today reaches a global audience of over ten thousand listeners. Also a writer, Ethan covers network engineering and the networking industry for a variety of IT publications. He is also the editor for the independent community of bloggers at PacketPushers.net. Follow @ecbanks.

  • http://blog.ioshints.info Ivan Pepelnjak

    Here’s an excellent document explaining various EtherChannel load balancing options on the whole range of Catalyst switches:

    https://www.cisco.com/en/US/tech/tk389/tk213/technologies_tech_note09186a0080094714.shtml

  • http://alouche.net Ali

    ” Back in the day, even numbers of interfaces was a requirement to form an etherchannel, but typically you can bundle any number you like today. ”

    It is important to keep in mind that an odd number of bundled ports will naturally result in more links being utilized regardless of the load balancing method in place…

    • Peter

      There are some ecmp options where adding another link will result in less available bandwidth. This is more for L3 ecmp with Cisco gear. I’ve seen cases where with 4 paths, one path gets 2x the traffic load of the other 3. Basically, the Cisco flow hash had 5 buckets. It assigned one each to three links and two buckets to the fourth. This means that one link is getting 40% of your traffic. With three links in the same configuration, traffic is split equally 33% each. To get equal balance of traffic (or at least better) one would configure the mls cef simple (or full simple). This is on the 6500, sup720 platform.

      Don’t we all dream that Cisco could figure out how to efficiently use whatever number of links in either a lag bundle or ecmp.

  • http://blog.ioshints.info Ivan Pepelnjak

    Oh, 6500 is a hodgepodge of kludges. Even funnier, your observation is in direct contradiction to what their documentation says. Even with L3 switching, CEF should have 16 buckets.

    • Peter

      Oh, CEF does have 16 buckets, but that does not mean they can ECMP across 16 paths. There is some command that will show you the mapping between available paths and those 16 buckets.

      We noticed that one link in a 4×10 gig ECMP path was running at twice the rate of the other 3. That is when we dug in and found out about the mls simple command. With 802.1ad, I think they use a different algorithm. I can’t remember off the top of my head.

      Without the simple config bit, the max bandwidth would be 5+5+5+10 before one starts dropping packets. 25 gbps. With 3 links and the same config, traffic balances evenly, providing 30 gbps. Ignoring things like clumping some huge flows onto a single link, without the simple command (and without multistage ecmp) 3 links provides more bandwidth.

      Fun stuff. The original article is a good read.

  • http://blog.alwaysthenetwork.com Colby

    Great write-up!