Disclosure: I’m the networking co-chair at the Open Compute Project and I work for Cumulus Networks in my day job. The opinions in this post are all mine, though. Enjoy.
Over the last several weeks, we’ve seen a lot of press releases around a small networking startup called SnapRoute and its software stack, FlexSwitch. But a lot of announcements seem to have fabricated or exaggerated what FlexSwitch is/does so I wanted to call an audible and reset folks’ expectations. So buckle up, buckaroos. It’s going to get bumpy.
What Is FlexSwitch?
FlexSwitch is a set of networking applications written in Go that make up L2/L3 networking technologies such as ARP, DHCP, BGP, VLAN, etc… (Question for the audience: How many times must we rewrite the ARP protocol? It’s a little like how many licks does it take to get to the Tootsie Roll center of a Tootsie Pop.)
A hardware shim must be used to hardware-accelerate these applications. In FlexSwitch, this piece is called asicd. This piece of software is responsible for talking to the ASIC and today is mostly closed source (due to the license encumberence of using vendors’ SDKs). Other than that, every protocol is a separate user-space application that provides independence from the kernel, very akin to Arista’s EOS.
To configure and receive status on each protocol, a REST API is provided for each application. If that’s too burdensome, there is a Python-based SDK for people that want to write their own configuration apps. And if that’s too much, there’s the ability to push a JSON config file along with a CLI app for those that are familiar/comfortable with the constraints of a limited shell.
Sounds good, doesn’t it? So let’s talk about what’s not there.
FlexSwitch does nothing in terms of managing power, temperature sensing, port configuration, link establishment, or anything else a NOS performs. It is dependent on an underlying NOS to perform these functions.
To recap, FlexSwitch is useless without a NOS running.
If that is true, then why run FlexSwitch? The heart of it is to have an alternative L2/L3 stack available.
How FlexSwitch Came Into Existence
A few network engineers wanted to change how we configure and run data center networks, and as such left their jobs and started SnapRoute in September 2015. After about 9 months of development, the team presented and demoed FlexSwitch at an OCP Engineering Workshop and started the contribution process to OCP a week later.
On September 14th, FlexSwitch was accepted into OCP (at the time of this writing, the foundation is working to move all FlexSwitch repos to the OCP organization).
Since then, we’ve seen FlexSwitch involved in a handful of projects, with the latest being the FaceBook Voyager project.
Is That Enough To Create A Company Around?
FlexSwitch, by itself, doesn’t seem enough to warrant investing in yet another implementation of ARP or BGP. There has to be more to it.
Over the last several months we’ve seen FlexSwitch become integrated into OCP’s ONL (Open Network Linux) by providing full forwarding capabilities with a simple package install. This integration makes sense because ONL has a good portion of platform drivers required to be a NOS.
However, when it comes to integrating FlexSwitch into OpenSwitch (OPS) or SONiC (or Dell’s OS10, which is based on SONiC), there’s some serious doubt. Lemme explain.
In the beginning of this post I mentioned FlexSwitch is a just a set of L2/L3 protocol applications that gets hardware accelerated by a piece of license-encumbered code called asicd. OPS and SONiC have their own architectures that drive the switch ASIC. SONiC uses OCP’s SAI interface, coupled with a switch state service daemon to keep the hardware/software states synchronized.
In OPS, this is done using OpenNSL. The hardware state is managed via OVSDB (NOTE: OPS does support SAI as an optional datapath, but so far it’s not used in its three supported platforms: OCP 5712, OCP 6712, or OCP 7712).
In order for FlexSwitch to integrate into OPS or SONiC one of two things has to happen:
- OPS or SONiC adopts FlexSwitch’s architecture and asicd (long shot)
- FlexSwitch has a separate integration point (essentially a branch) into OVSDB (OPS) or SWSS (SONiC) (in the realm of reality)
And then, something happened. After being a Linux Foundation project for about 3 months, HPE pulled out of OPS and the search for a suitor began. Some may ask, why would HPE pull out of a project it created and ran internally for almost 2 years? Well….we’ll save that topic for another day. And then we read that SnapRoute and Dell are taking the lead on the project.
That’s all fine and dandy but it doesn’t answer how FlexSwitch gets integrated into OPS.
According to the article, Dell will contribute pieces of its OS10 NOS to OPS. This is where things start getting weird. Remember how I said OPS and SONiC have different architectures? So how will code from SONiC (or OS10) integrate into OPS without some serious changes?
Cool…but can people live without OPS for about a year? Probably, since there are plenty of alternatives out there.
And there’s one last thing…legal ownership of FlexSwitch and who manages it.
SnapRoute has contributed FlexSwitch to OCP and OCP has formally accepted it (i.e. it’s no longer SnapRoute’s but under the stewardship of OCP, just like ONIE). So what is there to contribute to OPS besides the pieces that talk to the OVSDB?
(A note to the reader: when code/project is assigned to The Linux Foundation, all IP is reassigned to The Linux Foundation. At which point, a company cannot just arbitrarily assign copyright to another entity.)
Y’all can expect that if SnapRoute attempts to renege on the OCP acceptance of FlexSwitch and instead contribute it to OpenSwitch, it will be a mess (paperwork, possibly lawyers).
My guess is that they haven’t even thought it all the way through, which just adds more doubt to their strategy.