Lets get straight into it. Below is a diagram depicting a hypothetical (and entirely realistic) OpenFlow network. I pieced this together from various diagrams and bits from the Interwebz. We have an OpenFlow controller and three OpenFlow nodes. I have specified two axes here for consideration: A North-South axis (the control and configuration axis) and the East-West axis (the forwarding/data-plane axis). This post will explore various issues along these lines.
First, and perhaps tediously, I want to discuss virtualization again. As I stated in my previous post OpenFlow does not differ from current forwarding-plane mechanisms in that secure, transparent partitioning of the network (for multi-tenant/multi-security zone “M(T/SZ)” purposes) is assumed to be handled appropriately by vendors’ implementation choices along the entire North-South axis. I can’t help but think of Chris Hoff’s (@beaker) recent blog post about the potential future of server virtualization. He sees virtualization being driven more and more right into the CPU, potentially eliminating the need for a software Hypervisor.
I believe the ONF has all the right people in the room to consider driving virtualization in the network in a similar direction: all the way into the forwarding hardware. In a way, this could be like integrating some of the concepts around FlowVisor right into the spec. Instead of having a software controller with universal visibility and unfettered control of the network manage and orchestrate flows to create the illusion of virtualization, why shouldn’t the ONF explicitly define virtualization in the forwarding-plane? This would get us closer to a reality similar the one @beaker talks about. I would say this is especially important if applications or hosts will be signalling the creation of flows in the network directly. Having this defined in a standard means that compliant nodes (from whatever vendors) are known to provide secure, transparent virtualization. This would be altogether different than traditional routers and switches. The standard would not just define domains or contexts but also security behaviors relative to them (such as dropping frames with unexpected tags or having the forwarding-node issue an error if a controller attempts to create a flow between domains/contexts directly). Some of this behavior is already spelled out in audit guidelines and RFCs.
An argument against all this could be that some folks envision the network becoming just a really fast external bus between systems. Flows are created through policy or through APIs that applications can access. In other words there is no concept of virtualization. The flows themselves are the tenants/security-zones. This is considered ACL-based separation. If VMWare is good enough (VMware being a “controller” sitting over a non-virtualized pile of CPU and memory resources), then why not ACL-based separation in the network? Another argument against defining virtualization in the spec is that “forwarding-domains” or “OpenFlow contexts” would be take away from OpenFlow’s simplicity.
Lastly on virtualization: Along the EAST-WEST axis, why not using existing headers such as MPLS and VLAN outer-tags to differentiate among forwarding domains or tenants (regardless of where they are defined?) Particularly with MPLS, doesn’t this open up the possibility of replacing the middle forwarding node with an MPLS core (i.e., P-nodes)? There has already been some speculation on doing exactly this out and about on the internet.
The Need for Operational Transparency
As I stated in my previous post, it would be great if we had a way to troubleshoot the forwarding nodes and the controllers such that we can localize the problem and open a ticket with the right vendor. This can happen I think in one of several ways looking at this diagram. On the NORTH-SOUTH axis, a mechanism for comparing forwarding-information in the controller with forwarding information in the forwarding-node would be fantastic. The RIBs, FIBs, controller flow-tables, and node flow-tables will vary in structure from one vendor and platform to another. There are non-standard OpenFlow extensions that may be supported by one vendor but not another or supported incorrectly. Consider on the controller that the RIBs and FIBs might be one software vendor that is interacting with a different software vendor that created the OPCE process. Than a third vendor (or more) comprise the forwarding-nodes. It will be essential to some network shops to narrow down an issue as quickly and specifically as possible so they can report back up the chain on root cause and contact the right vendor immediately. The ability to verify that there is a consistent view of flow information across these structures will be critical.
Even on today’s routers where the control-plane and forwarding-plane are tightly coupled and belong to a single vendor there is still much visibility and configurability in the forwarding-plane. There are a multitude of CEF “show” commands and TCAM related commands showing operational and reporting data. There are a number of configurable knobs as well. In JUNOS you can shell right into a linecard to look at forwarding information on the forwarding-hardware. This is so because even in traditional routers built by a single vendor “stuff” happens and network operators need insight into the internal workings of the forwarding-plane.
On the EAST-WEST axis, how about defining an OAM or TRACE mechanism in the OpenFlow spec? This will ensure that all vendors support specific functionality. Think of it like an FDL or MDL channel on every forwarding-node link. The OAM message might have embedded flow criteria for the receiving forwarding node to match against its lookup structures. You should be able to initiate an OAM trace on some ingress OpenFlow node toward an egress node and have each node in the path report back to the controller with timestamp or other information. The receiving flow-node would package the OAM trace with its node ID and forward it up to the controller. This might reveal that the trace is not following the same path as shown in the forwarding-tables or you might discover that there is excess latency or delay for a flow on a particular link.
Understanding how it works and magic sparkly things
If you are considering OpenFlow, its time to challenge your understanding of what a control-plane or forwarding-plane is. Look at the diagram and tell me what parts are the CP and what parts are the FP? If you want to go strictly by the definition of “the control-plane ultimately configures the lookup table used to forward packets” then from the North to South the control-plane starts at the “Protocol/Policies” block and continues all the way down through and including the OPNE block in the forwarding node. Others might start the forwarding-plane at the OPCE process in the controller or even the FIB above that. Whatever way you look at it, you will need to understand how your combination of vendors is going to look. What does that stack really look like? If you have multiple vendors providing functionality on the controller how do they tie together?
What version of OpenFlow does each vendor support? Do they support the version “y” fully or do they support version “x” and only some parts of “y?” What about extensions to OpenFlow (both open and proprietary?) You might be buying the cheapest possible hardware on a port density basis but what about performance and operational transparency?
Auditors have guidelines for locking down routers and switches when using certain features such as VLANs, MPLS, or routing-protocols. Since OpenFlow is a generalization of *existing* approaches to the forwarding-plane, you will need to test these things and ensure that you can lock down the network appropriately and that the controller is providing adequate separation where required. How are separate security-zones managed in the configuration? Is transitivity between security-zones explicitly managed or could someone easily bridge two environments together inadvertently? Could anomalies or issues in one security-zone impact another by hanging a process or using too much memory or CPU (on the controller or the node)? How do you configure, monitor, and report on transitivity between security-zones in aggregate?
Can you leak packets inappropriately? What happens if you send a spoofed VLAN tag? What happens with man-in-the-middle ARP attacks? Are unused ports locked down? What are the default behaviors of a hybrid forwarding-node (that can support simultaneous operation of traditional switching on some ports while supporting OpenFlow configuration on other ports?) Just like any networking solution you will want to Proof-of-Concept a potential OpenFlow deployment. Use an existing security-testing tool and try to overwhelm the controller by sending thousands of packets a second each with a different source MAC address (if you are using OpenFlow in an ethernet type deployment). What about massive route or port fluctuation? What is the impact? Look for breaking points.
If you have to run routing protocols, than how many routes can the controller hold? Does it support 4-byte AS’s in BGP? What if you send an unsupported community or attribute type? What about those miscellaneous extra OSPF RFCs?
Be ready to troubleshoot once you are operational. You may want to steal one port from each forwarding-node and tie those ports back to a switch attached to a linux box running Wireshark. Then using a packet generator such as Iperf you can troubleshoot hop for hop through your OpenFlow network. Just like today when running protocols you’ll want to see what those protocol engines are reporting in debug. For SDN magic stuff on the controller, what troubleshooting tools are available? What are the fail points? Can you trace through it?
In conclusion, remember this: OpenFlow is a protocol for CP <-> FP interaction. Thats all it is. Whatever magic is built on the controller, you need to understand (just as you do today with traditional routers and switches) how packets transit an OpenFlow forwarding-plane and how to troubleshoot it and how it is secured. There may be all kinds of awesome and innovative stuff happening on the controller, but don’t be blinded by the razzle-dazzle of it. An OpenFlow network is still a network.
You still have to design your network smartly. You still have to operate and maintain it. You are still accountable for it.
You still have to think about it.
[ok... seriously. I'm going to take a writing class.]