Convergence: The Early Days
“Convergence” is a buzzword seen in the IT press constantly these days. All convergence means is placing communications that used to ride on its own network onto one unified network; Ethernet’s cheapness, ubiquity, and ever-growing link speeds makes it the network everything is moving towards. The first big convergence move was to combine voice networks with data networks, using IP telephony. The challenges of a converged voice/data network include prioritizing voice traffic over pretty much anything else during times of link congestion, and keeping call quality high by delivering datagrams in a predictable time with a predictable gap in between those datagrams.
Frankly, these problems were and are a big pain in the collective backsides of network engineers everywhere. Ethernet and IP are not transports intended to deliver traffic in a predictable, prioritized fashion. Ethernet is a best-effort frame delivery system that’s only survived as long as it has by reducing collision domains down to one via switches. IP uses higher-level protocols for reliability and underlying devices for prioritization. Therefore, delivering VoIP frames and packets across a converged infrastructure means a carefully designed and deployed quality of service plan; the larger and more complex the network environment, the greater the potential pain. Throw in congestion-prone wide-area links of varying transports with their own forwarding nuances (ATM is not frame-relay is not MPLS is not PPP), and the network engineer must know a lot about a lot to deliver an effective end-to-end QoS design. Even worse, no one hears the engineer scream who is configuring QoS on multiple platforms, all of which might require unique configurations depending on hardware and software to net the same results, even within the same vendor product families.
Convergence has evolved with the increasing affordability and adoption of 10-gigabit Ethernet. With the bandwidth and low-latency of 10G, the industry has pushed towards adding storage to the converged data center Ethernet, the idea being to eventually eliminate the unique Storage Area Network. Fibre channel is the protocol of major concern here, in that other storage protocols like iSCSI are more tolerant of changing network throughput characteristics. FC is not tolerant of a changing environment. FC expects that frames will be delivered on-time, every time, and therein lies a significant challenge for the converged data center Ethernet. While 10G is an awful lot of bandwidth, simply adding Even More Bandwidth (the tried-and-true method of capacity management for engineers who don’t want to think too hard) isn’t a safe answer. No matter how much bandwidth is available, the Ethernet carrying Fibre Channel traffic (FCoE) must be able to guarantee a lossless path from host to disk and back.
Don’t Drop The Baby!
Enter Data Center Bridging, or DCB. DCB comprises a set of proposed standards designed to extend Ethernet such that we can:
- Leverage flow control on up to 8 virtual links. (IEEE 802.1Qbb – Priority-based Flow Control) We can issue ethernet PAUSE frames on specific virtual links, and not interrupt forwarding for the entire physical link. Practically speaking, we could tell everyone but the storage virtual link to shut up for a moment. Edit – 10/22/2010 – Ivan Pepelnjak makes the point that you’d actually stop the storage traffic to make sure it doesn’t get dropped. A different way of thinking about it coming from a traditional congestion management approach, but having read what Ivan says on the topic, I think I grasp this better. Context is everything, and I haven’t dealt with converged storage enough to have dealt with the challenges storage faces on a congested link.
- Prioritize traffic classes within a virtual link. (IEEE 802.1Qaz – Enhanced Transmission Selection) Here, we can use QoS techniques within a virtual link to prioritize certain kinds of traffic. Voice – you rule! Network engineer’s web surfing – go to the head of the line! Bittorrent from Joe down the hall – tail drop.
- Exchange DCB information with other DCB devices. (also part of IEEE 802.1Qaz – DCBX, and expected to leverage LLDP) Two devices can learn about each other’s DCB-related link characteristics.
- Optionally notify upstream senders of downstream congestion, allowing the sender to mitigate the congestion through rate-shaping. (IEEE 802.1Qau – Congestion Notification) In a backwards way, this reminds me of RSVP, where you can do an end-to-end bandwidth reservation across multiple network devices. I’ll be interested to see how this is implemented.
All of this gives us the ability to guarantee that storage traffic has both the forwarding capacity and priority over other traffic flows when needed. Done right, an FCoE frame should never hit the floor. That said, I am interested to see how much hands-on engineering will be required to make this work as intended. Legacy QoS is far from automatic in the real world, and it seems fair to assume that a well-designed DCB implementation will require rather a lot of whiteboarding, implementing, monitoring, and tweaking before it perfectly serves the environment it has been deployed in.
Hey, Pal – Ya Wanna Buy An Enhanced Ethernet?
DCB has a couple of implementations: Data Center Ethernet (DCE) is Cisco’s flavor, and includes an implementation of TRILL. Cisco markets DCE and related advanced data center technologies under the term “FabricPath“, which is baked into the Nexus 7000 product line. In some Cisco documentation, they refer to DCE as a superset of DCB and another implementation of DCB, Converged Enhanced Ethernet (CEE). CEE has a much broader group of networking companies behind it, including Broadcom, Brocade, Cisco, Emulex, HP, IBM, Juniper, and QLogic, and is helping to craft how the DCB standards will finally look when ratified.
The point of DCB then is to provide an interoperable set of standards whereby storage traffic can be guaranteed the bandwidth and flow characteristics it requires across an Ethernet transport, while co-existing with other traffic behaving quite differently.
The problem? At this point, the jury is out on what form of converged storage the market will claim. FCoE is seeing increased adoption, but uptake has been slow. iSCSI has been around for a while, is generally well-understood by engineers who deploy it, and a popular choice. Vendors are pushing various and usually proprietary fabric schemes. Cisco, HP, Brocade, and IBM are all making acquisitions and creating product lines to sell you a single-vendor solution for storage, servers, and a converged network to run it on.
One thing’s for certain: it’s a fun time to be a network engineer.