Occasionally the topic of open sourcing a driver, library, or SDK to a commodity product comes up as more developers start working on/deploying said product. Typically, the vendor makes a concession and opens up the documentation to increase adoption and supportability.
Sometimes this works. Sometimes, it just pisses off the developers to such an extent that they make an alternative solution. This can come in the form of an unofficial driver/library (if the documentation is open enough), a shim layer (abstraction for those that are politically correct), or in the extreme case, an open source product that competes with the closed one.
Each one of these examples has already happened for NICs, video cards, and GPUs.
But what about switching silicon? This is a topic that has been getting more attention as vendors and end users adopt Open Networking with commodity ASICs.
Depending on who you talk to, this topic can be categorized three ways: don’t care, ideological, or game changer. Regardless, I’ll do my best to break it down into small, digestible pieces, if for no other reason than for chewing the fat.
Rather than dance around the subject, I will name the parties involved; not with ill intentions, but to present everything without the BS.
Today, it’s fair to say Broadcom is the #1 provider of merchant silicon for switches, with Mellanox and others chomping at the bit. This is true for traditional vendors (Cisco, Arista, Juniper) and is true for Open Networking vendors (Accton, Celestica, Dell, DNI, HP, Penguin, Quanta, and so on).
To use the silicon, you must possess the Broadcom SDK, which isn’t freely available and must be obtained via a channel (you know: a lawyer to approve the NDA, a tech lead to ensure you really need the SDK, and so on).
Once you have access to the SDK–including source, documentation, and examples–you can build and run your application. But if you want to distribute your product, it can only be in binary form as per the license requirements.
For some vendors, this is fine. For others that develop open source network operating systems (NOSes), you have an open NOS that doesn’t do packet forwarding. Might as well drop the ‘N’ from NOS at this point, or the word ‘open,’ because the forwarding engine is not.
Not only is the acquisition of the SDK cumbersome and a barrier to entry for open source projects, it doesn’t always come bug-free. For example, suppose the SDK has a bug that you find and like a good developer, you submit a bug ticket to Broadcom. How long do you think it’ll take before an updated version of the SDK with the fixed bug will be available? Days, ideally? Weeks, maybe? Months, most likely.
So where does that leave the developer/vendor that wants to ship with the Broadcom chip? Well, the clout of the developer/vendor will influence how fast a fix is supplied.
However, some developers that have intimate knowledge of the Broadcom chips may attempt to work around the issue by reading/writing registers directly. This essentially bypasses what the SDK is supposed to provide.
To make matters worse, the Broadcom SDK documentation reads like an errata appendix; meaning when changes to the SDK happen (like adding a new chip, e.g. Tomahawk), what documentation is supplied reads like a patch one sends to a mailing list. To fully comprehend the SDK at a particular point in time, you need to know what the previous versions had in them.
I know this sounds like Broadcom-bashing, but truth be told, all vendors face these same issues to varying degrees.
The Stories History Has To Tell
Before we move forward, let’s reenact an episode of Drunk History featuring two networking stories. You can imagine your favorite celebrities portraying various roles here. Hell, if you know these stories, get smashed and record yourself retelling them. Who knows, we might have a prize for the best videos.
The 56Kbps War
Before DSL and cable modems became commonplace, if you wanted Internet you had a modem and dial up service (AOL, Prodigy, CompuServe, et al). Two rival technologies were introduced to increase speeds beyond 28K or 33.6K: X2 backed by US Robotics and Flex backed by Rockwell-Lucent and Motorola.
Of course, each one was incompatible with the other, but they had one thing in common: they both needed to be upgradeable to support the upcoming v90/92 standard. Because of this requirement, most vendors that wanted to take market share from both Rockwell and US Robotics by selling cheap modems (remember, 56K modems were about $150 – $200 a pop) started selling Winmodems: modems that are just d2a (digital to analog) and a2d (analog to digital) converters that perform all other tasks in software (software defined radio anyone?).
The reason these modems were called winmodems instead of the actual term softmodems is because the software used to drive the modem typically only ran on Windows machines (Win95, Win98, and later WinME, the two-peckered goat of the bunch).
However, given the amount of resources required to drive the modem, online gaming became an issue unless you had the best hardware (like 2 socket CPU systems; one for your gaming and one for your modem).
So which modem won?
In terms of winning, I’ll base it off of modem support vs units sold. The reason for this is because, I along with many of you, when we bought a new computer during this time, one of the first things we did was throw out that P.O.S. Winmodem and install a real modem, typically from the box we were replacing. This modem was the US Robotics one (or 3Com since they were merged backed then).
Why? Besides being a real hardware based modem, the modem was well understood. Meaning software developers understood how the 28K and 33.6K modems worked and because the 56K modem was not all that different, they were able to extend the driver to support these faster speeds with ease.
What does this mean to you and me? USR 56K modem = rock solid performance regardless of OS. This was huge at the time. Because if you were running something other than Windows (e.g. Linux, BSD, etc…), you were already limited on options of modems (aka hardware modems only); but now Linux users had the same reliability as their Windows counterparts.
Well-understood hardware = great support = adoption across OSes.
In the server market around the early 2000s, we were once again at the verge of moving to faster Ethernet speeds (100Mb to 1Gb Copper, er BASE-T). Of course, there were competing technologies (surprise, surprise) but once the standard was ratified in ’98, all that was left was interoperability testing at UNH-IOL. Awesome, a great place to be in.
Except one thing: a Linux driver. Why does this matter? Well, it turns out during this time, Linux was taking the world by storm, especially in the server arena. RedHat was the Linux distribution for companies wanting to move away from Sun. At this time in history, when a developer (commercial or open source) wanted to develop for your device, they needed to sign an NDA and receive the technical spec for the product.
Also during this time, the OpenBSD project started making this issue publicly known and started boycotting support for companies that required the NDA to begin with. As history has shown, this method worked for the most part. If vendors wanted their products to work in open source, the documentation had to be readily available.
Intel, along with other companies started doing just that; free documentation for all. We now had open source drivers replacing closed source drivers. Except for one problem: interoperability and performance (ok, ok…2 problems; sheesh). As you can imagine, Intel wasn’t too happy that their E1000 NIC using an open source driver on Linux was not performing as well as the same NIC on Windows with the closed source driver provided by Intel.
Couldn’t this be solved by using the Intel driver on Linux? Sure, but now they to figure out which kernel to support (2.2 or 2.4), which distro to support (RedHat, Slackware, Debian, etc.), which architecture (x86, alpha, Itanium, etc.). Each of these questions become moot if the driver was provided in source form (say, in a project source tree). Radical thinking, I know, but this was the times back then (and in some cases, still true today).
Being led by pragmatic engineers, Intel did the unspeakable: they open sourced the E1000 driver for the new 2.4 Linux kernel along with BSD support. Imagine, same performance as the closed source binary but now with no strings attached. No binary blob to load into the kernel. The rest, as they say, is history.
Do you know what NIC is the most readily available and supported NIC in the world?
The Intel E1000-based NIC; regardless of whether it’s a physical NIC or virtual NIC, the E1000 is one of the best understood, and therefore most readily available and supported NIC, in history. All because: the documentation is readily available, the source code is available, and the card is understood by many.
What if we applied history lessons to the problem statement?
What if the merchant silicon companies took a page out of history (the E1000 page, not the Betamax or HD DVD ones) and applied that? We could be in a world where you can write a switching app once and run it anywhere (with recompile of course, to match your CPU architecture: e.g x86, ARM, etc…). So, why haven’t they? This problem can’t be tighter than a frog’s ass (that is to say, next to impossible)?
Well, it’s kinda hard to make someone (particular a market leader) do something they don’t want to do. They need to be encouraged in a variety of ways by both developers and competitors.
Some of this encouragement can come from developers that need to open source some of their components (the particular event I’m thinking of has FaceBook as the developer and the event being the open sourcing of its FBOSS switching controller at the 2015 OCP Summit). The result, OpenNSL: a binary shim library with GPL knet device drivers coupled with documentation on how to use the shim that is licensed under Apache v2 so that anyone who uses OpenNSL can make their application open source.
This only solves the open source restriction of application development by the Broadcom SDK but nothing else. Let me explain. As I mentioned before, the shim library is distributed in binary form which means at the time of this article, only 5 platforms are supported (two OCP switches, one pending OCP switch, and two other trident based devices) on x86 (some platforms are 32bit and the rest are 64bit).
Which means, if your platform is ARM- or PPC-based with a Trident2, you’re SoL. You’re back to a feature request with Broadcom because you can’t add support for that as a community member. Hardly open source, wouldn’t you say.
So, what are Mellanox and others doing to capitalize on these missteps? Lots. First, they are upstreaming their knet drivers to the Linux kernel and participating in a Linux Kernel NetDev project called switchdev (a kernel-based abstraction framework; here’s a past presentation and upcoming tutorial).
Mellanox is also a founding member of the OCP SAI project (a cross-platform abstraction layer in user space) with the goal of making their implementation available (NOTE: See Update 1 below). What does this mean to developers and end users? It means the end of having to use an SDK to drive an ASIC (be on the lookout for a future post describing kernel-based abstraction and user space abstraction and why we need both projects to succeed).
Essentially, Mellanox is taking the E1000 page from history and attempting to capitalize on Broadcom’s misteps. Others like Cavium XPliant, Barefoot Networks, Centec, and MediaTek are well on their way to either upstreaming their knet drivers, releasing their SAI drivers, or both. The question on everyone’s mind is: will it work? (Hint: pay close attention over the next several months, especially around the OCP Summit).
But I’ll tell you what: without an open source framework to drive merchant silicon, we won’t truly have an open source NOS. (Mind you, OpenSwitch, HP’s open source NOS project, is based on OpenNSL; a more apt name would be Almost-Open Switch)
As a wise old man once told a younger Carlos: You don’t go stepping on a fresh turd during a hot day. (Translation: Don’t kick someone when they’re down.)
There’s still plenty of time for Broadcom to alter their policies and practices to be more open friendly.
Update 1: 2015-02-01, Indicate that Mellanox’s SAI driver is based on their SwitchX-interfaces API (a shim library to their SDK).
Update 2: 2015-02-01, Add link to indepth post about SAI and switchdev.