I will admit that the first time I heard about OpenFlow I thought, “Great. Here’s another vaporware technology that will go nowhere.” I also realize that a lot of network engineers are saying the same thing, or might have difficulty understanding what the fuss is all about. If this is your thought, I hope that showing some examples of how I could solve specific problems I’m seeing in my company will get you thinking about OpenFlow and how it might help you as well.
OpenFlow uses the exact same mechanisms that our autonomous protocols and static configurations use (i.e.: programming the TCAM). Whatever methodology you are using to solve existing problems, inevitably you’re programming the TCAM within the router or switch, which is exactly the same mechanism OpenFlow uses. This is nice for another reason: just because the Operating System / CLI refuses to parse a particular command doesn’t mean the feature isn’t programmable via OpenFlow.
In short, OpenFlow allows the network designer to solve real-world problems without necessarily having to wait for the vendor to provide a particular feature.
My Problem: OpEx (circuit cost) gets in the way of good design.
Corollary: We don’t always have the idealized network to work with.
We need a way to meet our customers needs; OpenFlow is another tool that can help meet those needs.
Enterprise Traffic Engineering (TE)
Unless you have trained staff and the appropriate hardware & licenses to manage MPLS TE tunnels (or, gasp, Policy-Based Routing “PBR”), we don’t have a way to engineer our traffic within our Enterprises.
My company has a basic remote site design: Internet & MPLS at every site, with a VPN backup. I would like to drive down the OpEx of our MPLS by using the Internet / VPN to route traffic that is either high bandwidth or doesn’t require any sort of QoS. Specifically:
- We have sites in APAC where the cost of an Internet circuit is orders of magnitude less expensive than the MPLS circuit.
- We have sites where our Internet circuit is paid for as part of the lease (i.e.: FREE) and significantly higher bandwidth than our MPLS circuit.
Since we don’t have the expertise, licensing, or staff to support TE tunnels, and I don’t want to begin using PBR, everything bound for the company traverses the MPLS circuit using standard shortest path routing protocols. I would like to route our Voice and specific Enterprise applications via MPLS, with everything else going via VPN or the Internet. If I could instantiate OpenFlow entries in my remote site routers, I could solve this issue and reduce the need for high-bandwidth MPLS links.
Another example: I have 3 sites that are connected in a ‘triangle’, and would like to allow traffic to take unequal cost paths, or route some traffic via a particular link during specific times. I don’t currently have a good way to do this. Here’s a simplified diagram for reference:
In this topology, I would like to allow traffic between the MFG site and the Northern CA site on not only the directly connected 2GbE link, but also the 2GbE link going via the Southern CA site. This will effectively give them 4GbE of throughput between the MFG and Northern CA site. The only way to do this currently with my network is by using PBR (unequal cost load balancing is not supported by these devices at this time).
The other feature I’d like to enable in this topology is to pin my backup traffic on the 1GbE link. Again, the only solution I have involves PBR or perhaps some static routing on either side of the link. Either option is bad in my opinion.
OpenFlow can solve these issues by installing flows within the routers that will send all of the matching traffic via the links I specify. The backup traffic can be pinned to the 1GbE link, and I can send all traffic between the MFG and Northern CA site via both links, rather than relying on shortest-hop routing protocols.
Security
The ability to drop flows everywhere within an Enterprise can sometimes be useful. We experience phishing attacks, and it would be great to block the outbound control traffic at a flow level instantly across the Enterprise. Typically, a user clicks on a phishing link, which installs a payload on the users PC. This program will then initiate a C&C connection. The ability to block the C&C connections across the Enterprise on a near real-time basis would be very helpful. Right now, we have to access every firewall across our enterprise to install an ACL. Then we have to remember to remove the ACL at a future time. The other possible benefit is that the OpenFlow controller may be able to tell me what hosts have requested that particular flow and/or it could even be integrated with some sort of IDS / IPS that watches for flow request entries that match known botnets or similar.
Data Center
I may be a Luddite, but I don’t care much for vendor-specific MLAG technologies. I would much prefer to have a vendor-neutral solution that doesn’t require me to uplift all of the gear I just spent time installing & justifying to management. TRILL has promise, but will require me to reinstall everything in my Data Center. OpenFlow, ostensibly, will not. By allowing the instantiation of flows in our switching gear, we may be able to program our switches to forward on multiple links without the aid of TRILL or vPC or (pick your favorite here). And since it uses existing TCAM, we won’t require any hardware uplift.
Risks
I will admit that there are plenty of risks to OpenFlow, and I’m not convinced that OpenFlow will be THE solution to all the world’s problems. In particular, we have a number of risks to deployment. Here are the risks I am most concerned with:
- Forwarding / Control plane mismatch – anyone who has run a large network has inevitably run into problems where the control and forwarding plane don’t match up – sometimes these issues fix themselves by resetting a port channel or interface. These issues can be very difficult to troubleshoot and usually involve tech support running special commands to debug the TCAM. As engineers, we will need instrumentation to help us see what’s installed in the TCAM (preferably in a human-readable format).
- South -> North communication – If the hardware doesn’t support a particular TCAM feature, and the hardware either doesn’t raise an exception to the OF entry, or the OF controller doesn’t react appropriately, issues can arise. We need a robust signalling mechanism for the switch to communicate back to the OF controller when exceptions arise.
- Fate Sharing – this is the principle (defined by Clark) that all state should be maintained at the Edge of the network, and the Core simply forwards packets (maintaining little or no state). This is a really huge topic that I will attempt to cover in a separate post, but basically if we begin to instantiate state within the Core that is destroyed when a forwarding device blows up, how does the network react? In other words, if we have OpenFlow controlling the forwarding path across the Core, what happens when a Core device dies (and the OpenFlow entries die with it)?
