Big Switch Network, OpenFlow, and Virtual-Networking

There’s been a lot of movement happening lately under the “Virtual Networking” moniker.  I sat down this morning and thought about how I might approach blogging about Virtual-Networking as it is right now.  Its a vast topic.  So vast that I decided that all I’m going to blog about going foward is Virtual-Networking from a Network Engineer’s perspective.  By this I mean the actual nodes and connections that comprise the network plumbing.   Routers, switches, network-appliances, and how they support secure multi-tenant/multi-security zone networks:  M(T/SZ) networking within and between Data-Centers.    I like the M(T/SZ) acronym because this topic applies to both SP (multi-tenant) and Enterprise (multi-security zone) environments.   Everyone is a Service-Provider now guys (and ladies)…  Even you Enterprise toads.  Before we get into that mess though, I want to revisit my previous OpenFlow and Virtual-Networking blog post…

Back on 9/20 I was able to pay Big Switch Network a visit in their new office in Palo Alto.  Walking through the front door, its immediately apparent that this is a start-up.  Folding tables, couches, and energy-drinks.  The excitement is palpable.  These folks really, really believe in what they are doing.  If you haven’t read the “The Innovator’s Dilemma” by Clayton Christensen, then you should.  We are at the top of the S-curve in the Networking Industry.  SDN and OpenFlow are the emerging technology and we are at the intersection of the old and new.  Big Switch is betting on the problems they can solve and the value they can deliver with OpenFlow.

They have quite the roster too:  Rob Sherwood, a leading figure in OpenFlow development.  Rob Rodgers, a prior leading development engineer behind the ASR 1K at Cisco and prior development engineer at Juniper.  Balaji Sivasubramanian, CiscoPress author and previous lead product manager for the Nexus 7k.  Mansour Karam, PhD EE and former Managing Director of Business Development of Arista.  Howie Xu, former head of Networking R&D at VMWare.  This is only a fraction of the talent they have acquired.  With the exception of Howie (the announcement about his move to Big Switch came the day after we were there), these are some of the people that gave us their time during our visit.

Rob Sherwood thought I would show up with my “PacketPushers” hat on, and in a way.. I did.  I’m not a journalist or philosopher.  My posts on PacketPushers are less about speculation and more about the immediate challenges I have as an engineer with M(T/SZ) networks.   So I went straight into how to solve a problem with OpenFlow:  M(T/SZ) networking all the way into the host.  We also briefly touched on waypoint-routing and virtual-topologies:  specifically putting virtualized appliances between VRFs on an MPLS infrastructure and then mapping traffic to the appliance with route-targets.  Could OpenFlow do this better?

After giving my usual anti-VMWare rant about how they fail at virtual-networking, we started to draw some diagrams.  I explained how we could solve secure multi-tenancy for VMWare hosts with pass-thru modules:  On the network side, we will do VLAN translation (to resolve VLAN overlap), and terminate VLANs from the host into virtual-router and virtual-switch instances.  On the VMWare host side we would use port-groups and some hypervisor based firewall solution to explicitly prevent traffic from passing from one portgroup to another.  Rob S offered a similar solution using VLAN translation towards the host from a layer of OpenFlow nodes sitting between the hosts and the network.  If the OpenFlow nodes and the controller supported MPLS we could actually turn those into PEs as well and uplink those right into our P core.  Or we could stitch up towards existing PEs with QinQ.  I offered up that we could build pseudo-wires between data centers on our existing MPLS infrastructure to interconnect the OpenFlow nodes at one datacenter with the OpenFlow nodes at another datacenter.  Rob S remarked that this is something they’ve recommended and done already with some customers.

Take note virt-wanking OpenFlow hipsters:  Big Switch seems to understand the importance of integrating into existing networks.  While other companies blatantly ignore the big elephant in the room, Big Switch is talking about it.  Instead of answering questions with whitespace and noise, Big Switch tries to solve problems *within* existing networks.  You do have to think about the network.  Even if its built on OpenFlow.

After insulting Rob Rodgers (to his face) about the RP1 processor on the ASR 1K, he started talking about load-balancing flows across multiple appliances cabled into an OpenFlow network… Waypoint routing on steroids.  Per-flow waypoint routing AND load-balancing: Much better than our current MPLS method, which really only works at the subnet level and requires a verbose amount of configuration and only really supports manual load-balancing through routing-policy.  Then Rob S was describing how you could enforce waypoint routing policies even in the face of a device moving from one side of the network to another.  The controller would see that the device moved and then re-program the forwarding hardware to the get the traffic back to the firewall or IPS or load-balancer or…  something something something…  ahem.  Oh and Juniper created an OpenFlow instance type and Rob took an MX to some show… <rabble> <rabble> <rabble>.  Then Rob starting talking about FlowVisor.  Before I arrived at Big Switch I really wanted to talk about virtualization and OpenFlow.  FlowVisor was an important part of that plan.  It was too late though.  My ability to articulate sentences and my overall perspicacity was dying…

…because my head had melted.  I had a powerful NerdGasm right in Big Switch’s conference room.  If OpenFlow does provide adequate “separation” for M(T/SZ) networking (as it is commonly understood by Service-Providers, their customers, and auditors) then a whole lot of the complexity we are dealing with now goes away.  And *this* OpenFlow company understands the value of integrating with existing networks.  And they’re already working with Juniper.  This kool-aid sounds good.  Real good.  I imagined Rob Sherwood in a Kool-Aid man outfit bursting through a wall.. “OH YEAH! Drink OpenFlow!”

This naturally led into a conversation about my previous blog:  Does OpenFlow have the kind of separation required for real virtual-networking?  Except it wasn’t much of a conversation.  I can’t remember what Rob said exactly, but it was less than a sentence before my train of thought was completely derailed.  In fact, it was basically derailed for the rest of the evening.  Not even beer and sweat-inducing spicy Indian food could help me.  I tried not thinking out loud too much because that generally scares people.  Still, they probably thought I was broken.   Here I was sitting at the table with an unbelievable amount of talent,  people on the fore-front of a networking movement, and I was stuck in a “GOTO 10″ loop barely able to form sentences.  By the end of the evening, I could only summarize the whole issue with this: “Where does the control-plane end and the forwarding-plane begin and what implication does OpenFlow’s answer to this have for M(T/SZ) networking?”

On the one hand, I understood OpenFlow is adding/removing flows in multiple flow tables.  On the other hand, are there per-VRF FIB tables?  Where does OpenFlow sit?  Does it program the forwarding hardware directly with information derived from a FIB abstraction (i.e., do you generate OpenFlow messages from FIB tables on the controller)?  Or does it program the FIB abstraction and then the switch vendor programs the hardware based on this abstraction (i.e., do you create FIB tables on the nodes themselves with OpenFlow?)

Rob Sherwood probably would make a hilarious kool-aid man, but before he suits up… I needed to sort this out.

In the following weeks I plowed through various books about how routers and switches are built:

1. “Cisco Express Forwarding” by Nakia Stringfield, Russ White and Stacia McKee
2. “Inside Cisco IOS Architecture” by Vijay Bollapragada, Russ White and Curtis Murphy
3. “High Performance Switches and Routers” by H. Jonathan Chao and Bin Liu
4. “Network Algorithmics” by George Varghese
5. “Network Processors” by Ran Giladi
6. “Network Systems Design Using Network Processors” by Douglas Comer

***At this point, I’d like to take a moment to apologize to my wife for the amount of money I spent on books in the last few weeks.***

I scanned numerous academic papers on forwarding-plane design.   I went through various vendor white-papers and design guides floating about on the internet, such as the “Network Virtualization:  Path-Isolation Design Guide” on Cisco’s website (which is quite good actually).  I skimmed over some of the ForCES documents as well.   Of course I read the the OpenFlow standards themselves quite thoroughly, various white-papers about OpenFlow, various academic papers, the mailing-list archives and the wiki.

I didn’t read everything front to back, of course.  Instead, I searched for the parts that talked specifically about how and why lookup structures in the forwarding-plane are built.   Finally, I turned to the IETF’s L3VPN, Cisco-NSP, and Juniper-NSP mailing lists:  “Why do we need per-VRF FIB separation?  Is it performance or security related?  What about a single table with a field/column for the VRF-ID?”

I’ll summarize the outcome of all of this in two points:

1.  “FIB per VRF” or “FIB per VSI” can mean flow entries indexed by instance ID in a single table, it doesn’t strictly mean a single table or tree for each instance.

The Catalyst 6500/7600 use TCAMs in some of their hardware to forward traffic.  In these TCAMs there are multiple tables, but there is not a  “table-per-VRF.”  In these TCAMs the forwarding lookup happens against a table that uses an index that is the VRF_ID. The other fields are  a prefix and then a pointer to an adjacency table.  CEF maintains all the usual CEF data structures “above” the TCAM but from this information the TCAM table entries are  built.  This general model actually applies to multiple vendors’ platforms.  On most QinQ supporting platforms, there is only one MAC table and they are indexed  by the Outer VLAN tag associated with the S-VLAN.  However, both TCAM based systems and QinQ based separation are accepted as providing adequate  separation and transparency.

OpenFlow v1.1 is based loosely on a serial pipeline TCAM model.   OpenFlow could be implemented to have a shared table that was indexed by a compound key that included ingress port, VLAN, and MPLS label.  This table could be built from information across multiple RIBs.  This is an important hypothetical assertion.  In reality this will look different depending on how the controller and the forwarding node are designed.

For instance, EZ-Chip converts OpenFlow v1.1 flow information into TCAM structures that fit neatly into their existing TOPcore architecture.   In their demo the forwarding node has a thin OPE (OpenFlow Processing Engine) software-layer that receives messages from a controller and builds generic OpenFlow tables.  From there the information is broken into pieces and plugged into the TOPcore architecture across multiple NPUs connected via PCI bus to the host module running the OPE process.  In their diagram it looks like some TCAM information is stored local to the NPUs and some is stored in external memory.  It doesn’t look like there are per-VRF tables in the TCAMs.

2.  What exactly constitutes the “control plane” and the “forwarding plane” and assurance of VRF separation differs from vendor to vendor and platform to platform.

Some look at the control-plane as “protocols/policy -> RIB”   while others also include the FIB.  In some implementations the structure of the FIB is very close to what is programmed in hardware while in others it is not.  The FIB might have a table, but the hardware might have tries.  In some cases the FIB remains on the routing-engine, and in other cases it is spread across the linecards of the device.  As we already discussed in #1 some implementations have all the VRFs share a lookup table or trie (yes, there are shared trie implementations such as Click) in the FIB or in the hardware.  In other cases there are per-VRF structures even in the hardware.

Where OpenFlow fits into this model will also vary.  There could be a FIB on the controller as well as the forwarding-node.  Suppose Cisco offered support for OpenFlow in the 6500 or 7600.  If the device were to accept OpenFlow messages from a controller (and behave like a node in that respect), than it might build CEF entries from the messages.  However it might also maintain a separate OpenFlow like FIB and from this program TCAM entries in its various linecards.

Cisco could also turn any given router or switch into an OpenFlow controller itself.  In that case OpenFlow might sit “below” CEF and OpenFlow messages would be derived from it and forwarded to remote devices.  You could create a distributed Catalyst 6500 with commodity OpenFlow hardware.  It would be like QFabric, but with Catalysts and an open standard for CP <-> FP interaction.

hmmm..  What if you did *both* models?  A distributed Catalyst representing itself as a single node to an external controller?  Perhaps that is what Juniper could do with QFabric (at least conceptually.. the QFabric architecture is not based on OpenFlow).

—————————————————————————————————————————————————-

With these two points in mind, I have decided that OpenFlow has what it needs to support real separation for SP-like L2VPN/L3VPN virtualization.  With the introduction of support in v1.1 for stacked MPLS and VLAN tags, OpenFlow can also provide for differentiation on the wire and integration with existing M(T/SZ) networks.  It will be up to the vendors of the controllers to ensure that the controller meets the IETF (and other standards bodies’) security standards for M(T/SZ) networks.  For instance, you must drop packets received on a PE/CE interface that have invalid VLAN or MPLS tags.  There will likely be multiple RIB-like structures, but the implementation details below that will vary greatly from platform to platform.  The bottom line is that the controller will have to manage the control- and forwarding-planes appropriately to provide secure transparent network overlay service.   Which isn’t much different than how it is today with traditional routers and switches.

What of the virtualization of OpenFlow itself?  What if we had an OpenFlow network partitioned such that it can be controlled by multiple controllers? Should we settle for existing Service-Provider type virtualization then?   Instead of multiple-controllers, could you just do all that with role-based administration on a single controller?  What if you had an API for controller-to-controller communication so a client controller system could program flows in a slice of a provider’s network through a customer-facing front-end controller?  What about Juniper’s OpenFlow instance type?   MPLS creates the illusion that all the VRFs that belong to one L3VPN are basically one router, or all the VSIs in an L2VPN are basically one switch.  If there was an OpenFlow instance-type in an MPLS network could all the OpenFlow instances in an OpenFlow VPN appear as a single OpenFlow node to an external controller?  MPLS would transparently stitch the the instances together.  I see a possible MP-BGP extension that would carry OpenFlow-like messages coming out of this scenario…

Enough on virtualization though… here are two additional items coming out of the Big Switch meeting:

First, OpenFlow v1.1 is problematic as a specification.  Unfortunately, I don’t have all the details but I do know that the implementation of multiple tables was problematic for some vendors.  I think there is some debate about the flow diagram for the matching process.  There is certainly some disagreement on the use of virtual-ports.  For this reason we probably shouldn’t expect to see much in the way of OpenFlow v1.1 compliant nodes.  For instance, Open vSwitch will probably skip v1.1 and go to v1.2 (where hopefully these things will be hashed out).  Speaking of v1.2, the ONF was chartered to run with the development of the specification.  Numerous vendors and potential customers are participating in the ONF.  The way they are structured, the committee overseeing the development of the standard is chaired by network operators with field experience.  They represent different verticals that can bring insight into the OpenFlow development process.  Membership to the ONF costs $30k.  If your organization would like to participate, you can go the ONF web-site and get more information…  I am perfectly willing to have someone sponsor me. :-)

The second item was a brief discussion about how, right now, many OpenFlow controllers and nodes operate in an opaque fashion.  No system is perfect, there will be bugs.  I think it would help tremendously if troubleshooting tools were created that could poll forwarding-nodes for their existing forwarding-rules.  It would also help if we could see on a per-VRF/VSI/tenant/security-zone basis what flows the controller thinks the node should have.   That would be a good start without having to spill the beans too much on how the controller is built.   I think something like this will be necessary because of the potential for so much variation in how the forwarding-plane can be constructed between the controller and the forwarding nodes, especially with multiple vendors.  It would be nice to know whose throat you need to choke.

Overall, it was a great day for geekery at Big Switch with beer and food included.  Its an exciting time right now in networking in general, but it seems like Big Switch is on fire right now with top talent and an upcoming product (which I can’t reveal anything about) that will blow minds and change the way you think about networking.

[there is a follow-up to this post located here]

Cloud Toad
CloudToad is adrift on the great sea of network serenity. - CCIE #15672 (RS, SP) JNCIE-M #721 Twitter: @cloudtoad LinkedIn: http://www.linkedin.com/in/derickwinkworth - Derick's opinions are his own and do not reflect those of the company he works for.
7ads6x98y