Markku Leiniö linked in one of his articles to a new mLACP feature for the Cisco Catalyst 6500, which naturally caught my eye. To create an etherchannel spanning multiple physical 6500’s (desirable to maximize link utilization and for redundancy), you previously needed to use the Virtual Switching System supervisor engine to create a VSS super-chassis. In testing, I didn’t love VSS. It was fussy to get set up with a finicky chassis loading order, dual supervisors in a chassis were not supported, and the code was unstable in the revision Cisco recommended to us at that time. Now, that was in 2009. I have heard some good things since that time from people who’ve used VSS in their production environments. However, there’s a risk that’s endemic to any technology where responsibility for a given task floats between two or more devices: split-brain.
Split-brain is a situation where the devices sharing the responsibility in question lose touch with one another. When a peer can’t see his mate, he assumes the worst, and asserts himself as the responsible party. Thus, you’ve got split-brain – two devices believing that they should be performing a given task. In a split-brain VSS super-chassis, Ivan Pepelnjak points out that at the least you’re going to have to cope with one member of the super-chassis reloading once VSS detects the split brain. You want to explain that to your boss? And probably his boss? Neither do I, which I why I never recommended VSS in the “five nines is not enough” environment I used to work in. Sure, in a perfect world, this is a non-issue, because redundancy and failover will work as advertised, and your upstream devices will never know the difference. When you find that perfect world, please let me know. Failover situations are never as simple as “this device lost power”. The situation that leads to a failover seems to always be ugly, and someone always gets hurt.
But I digress. The interesting thing to me is that Cisco is no longer saying “buy VSS” to allow an etherchannel to span multiple 6500s. Well, kinda. I mean, that *is* what Cisco’s saying in a certain sense, but there’s some really big catches. The biggest catch in my mind is in illustrated in Cisco’s diagram below – do you see my complaint? It’s in that word “standby”.
- First, mLACP is released with the 12.2SXJ code train. SXJ is a new code train, so as always, consider carefully where you are running this code. SXJ is far from proven in my mind, although the SafeHarbor program has given SXJ a “recommended” rating if you find that reassuring.
- Second, although mLACP allows you to uplink portchannel members across two switches, only one of the links will be forwarding. The other will be in an LACP standby state.
- Third, there are a number of hardware restrictions. Only Sup720 or Sup720-10G is supported, and they must be higher than PFC3A, as PFC3A does not support mLACP. VSS sups do not support mLACP (which makes sense). 6500 chassis’ containing the WiSM do not support mLACP.
- Fourth, only a single uplink from the server to each switch is supported, so you can’t take a quad NIC and split it into 2 uplinks to each 6500’s with mLACP. Effectively then, all you can do is dual-home a server.
- Fifth, this does not appear to be an interswitch technology, at least not by intention. This is purely for access-layer host uplinking. If you wanted to dual-home a switch, you could accomplish the same thing with rapid spanning-tree, presuming a well-designed STP domain where you understand your root bridge placement.
What do you think? Does mLACP solve any issues for you? Or are you going to pass on this one?