Why I Am Excited About OpenFlow

I will admit that the first time I heard about OpenFlow I thought, “Great. Here’s another vaporware technology that will go nowhere.”  I also realize that a lot of network engineers are saying the same thing, or might have difficulty understanding what the fuss is all about. If this is your thought, I hope that showing some examples of how I could solve specific problems I’m seeing in my company will get you thinking about OpenFlow and how it might help you as well.

OpenFlow uses the exact same mechanisms that our autonomous protocols and static configurations use (i.e.: programming the TCAM).  Whatever methodology you are using to solve existing problems, inevitably you’re programming the TCAM within the router or switch, which is exactly the same mechanism OpenFlow uses.  This is nice for another reason: just because the Operating System / CLI refuses to parse a particular command doesn’t mean the feature isn’t programmable via OpenFlow.

In short, OpenFlow allows the network designer to solve real-world problems without necessarily having to wait for the vendor to provide a particular feature.

My Problem: OpEx (circuit cost) gets in the way of good design.
Corollary: We don’t always have the idealized network to work with.
We need a way to meet our customers needs; OpenFlow is another tool that can help meet those needs.

Enterprise Traffic Engineering (TE)

Unless you have trained staff and the appropriate hardware & licenses to manage MPLS TE tunnels (or, gasp, Policy-Based Routing “PBR”), we don’t have a way to engineer our traffic within our Enterprises.

My company has a basic remote site design: Internet & MPLS at every site, with a VPN backup. I would like to drive down the OpEx of our MPLS by using the Internet / VPN to route traffic that is either high bandwidth or doesn’t require any sort of QoS. Specifically:

  1. We have sites in APAC where the cost of an Internet circuit is orders of magnitude less expensive than the MPLS circuit.
  2. We have sites where our Internet circuit is paid for as part of the lease (i.e.: FREE) and significantly higher bandwidth than our MPLS circuit.

Since we don’t have the expertise, licensing, or staff to support TE tunnels, and I don’t want to begin using PBR, everything bound for the company traverses the MPLS circuit using standard shortest path routing protocols. I would like to route our Voice and specific Enterprise applications via MPLS, with everything else going via VPN or the Internet. If I could instantiate OpenFlow entries in my remote site routers, I could solve this issue and reduce the need for high-bandwidth MPLS links.

Another example: I have 3 sites that are connected in a ‘triangle’, and would like to allow traffic to take unequal cost paths, or route some traffic via a particular link during specific times. I don’t currently have a good way to do this. Here’s a simplified diagram for reference:

triangle

In this topology, I would like to allow traffic between the MFG site and the Northern CA site on not only the directly connected 2GbE link, but also the 2GbE link going via the Southern CA site. This will effectively give them 4GbE of throughput between the MFG and Northern CA site. The only way to do this currently with my network is by using PBR (unequal cost load balancing is not supported by these devices at this time).

The other feature I’d like to enable in this topology is to pin my backup traffic on the 1GbE link. Again, the only solution I have involves PBR or perhaps some static routing on either side of the link. Either option is bad in my opinion.

OpenFlow can solve these issues by installing flows within the routers that will send all of the matching traffic via the links I specify. The backup traffic can be pinned to the 1GbE link, and I can send all traffic between the MFG and Northern CA site via both links, rather than relying on shortest-hop routing protocols.

Security

The ability to drop flows everywhere within an Enterprise can sometimes be useful. We experience phishing attacks, and it would be great to block the outbound control traffic at a flow level instantly across the Enterprise. Typically, a user clicks on a phishing link, which installs a payload on the users PC. This program will then initiate a C&C connection. The ability to block the C&C connections across the Enterprise on a near real-time basis would be very helpful. Right now, we have to access every firewall across our enterprise to install an ACL. Then we have to remember to remove the ACL at a future time. The other possible benefit is that the OpenFlow controller may be able to tell me what hosts have requested that particular flow and/or it could even be integrated with some sort of IDS / IPS that watches for flow request entries that match known botnets or similar.

Data Center

I may be a Luddite, but I don’t care much for vendor-specific MLAG technologies. I would much prefer to have a vendor-neutral solution that doesn’t require me to uplift all of the gear I just spent time installing & justifying to management. TRILL has promise, but will require me to reinstall everything in my Data Center. OpenFlow, ostensibly, will not. By allowing the instantiation of flows in our switching gear, we may be able to program our switches to forward on multiple links without the aid of TRILL or vPC or (pick your favorite here). And since it uses existing TCAM, we won’t require any hardware uplift.

Risks

I will admit that there are plenty of risks to OpenFlow, and I’m not convinced that OpenFlow will be THE solution to all the world’s problems. In particular, we have a number of risks to deployment. Here are the risks I am most concerned with:

  1. Forwarding / Control plane mismatch – anyone who has run a large network has inevitably run into problems where the control and forwarding plane don’t match up – sometimes these issues fix themselves by resetting a port channel or interface. These issues can be very difficult to troubleshoot and usually involve tech support running special commands to debug the TCAM. As engineers, we will need instrumentation to help us see what’s installed in the TCAM (preferably in a human-readable format).
  2. South -> North communication – If the hardware doesn’t support a particular TCAM feature, and the hardware either doesn’t raise an exception to the OF entry, or the OF controller doesn’t react appropriately, issues can arise. We need a robust signalling mechanism for the switch to communicate back to the OF controller when exceptions arise.
  3. Fate Sharing – this is the principle (defined by Clark) that all state should be maintained at the Edge of the network, and the Core simply forwards packets (maintaining little or no state). This is a really huge topic that I will attempt to cover in a separate post, but basically if we begin to instantiate state within the Core that is destroyed when a forwarding device blows up, how does the network react? In other words, if we have OpenFlow controlling the forwarding path across the Core, what happens when a Core device dies (and the OpenFlow entries die with it)?
Ryan Niemes

Ryan Niemes

Ryan Niemes is a network engineer and sometimes architect. He has worked in the industry for long enough to remember having to solder ends on Ethernet BNC connectors, fiddle with vampire taps, and has actually used DLSW in a production environment. He is CCIE #7966, and CCDE #20090006.
Ryan Niemes

Latest posts by Ryan Niemes (see all)

  • http://twitter.com/edrtz ed rtz

    Great article, but I’m concerned about few things:
    In my company we have about 100+ sites remote/MPLS and DSL/backup through various providers – all managed by a third party company.
    If I distribute different traffic through backup/MPLS it becomes a nightmare to troubleshoot if that DSL link goes down, there would be 100 different DSL providers involved and they are not as quick as MPLS business provider on solving the issue, hence we have DSL to backup MPLS because usually these business circuits problem get resolved sooner than any DSL. (I had a problem with a DSL-only site, and the DSL provider there operate only MON-FRI, 8-5pm)

    Another problem I see in my scenario is firewall/filters, our http traffic has to go through our HQ because everything goes firewall/proxy.

    A question I’d like to ask is why do you think PBR is horrible if you really want to just redirect traffic.

    • http://twitter.com/nkrypted Brandon Mangold

      For me mechanisms like PBR are much harder to “operationalize”. It’s not as intuitive to decypher the anticipated behavior of traffic when PBR is involved and the troubleshooting tools are literally a hop by hop examination of the PBR policy in conjunction with the underlying routing and physical connectivity topologies. It generally doesn’t pass the “2am test”.

      • http://twitter.com/niemesrw ryan niemes

        HA! 2am test, Russ would be so proud.

    • http://twitter.com/niemesrw ryan niemes

      Hey Ed – thanks for the feedback. I can’t really comment on your particular network, only mine – where I do see a potential use case for OF. As for PBR, it’s not necessarily horrible, just not something that I want to support in my network – it’s not autonomous, must be manually configured on every device, and it can thus be error-prone. It’s also not universally supported with the same feature set on all of my devices. OpenFlow’s mechanism could be (potentially more easily than a parsed feature).

  • http://twitter.com/nkrypted Brandon Mangold

    I am going through a similar exercise right now with MPLS vs VPN traffic balancing. What I ultimate want is the ability to route traffic based on intelligent, dynamic & real-time application level metrics. PfR is another option for dynamic, intelligent traffic routing albeit not the most palatable option.

    With that in mind have you thought much about what I feel is probably the most powerful aspects of the potential of OpenFlow, the northbound API? Specifically the ability to communicate with an orchestration engine that can allow applications to communicate with infrastructure to reserve flow properties through the network, thus allowing the network to generate and build that “reservation” (I love recycling technology).

    A few more notes:
    Security: You still have to remember to remove the blocked flow if it’s properties are stored in the central flow table.

    Fate Sharing: The biggest potential downfall of OpenFlow. I don’t see how we cannot live without a hybrid topology in which the switches can operate with some level of autonomy in the event of failure to the central brain.

    • http://twitter.com/niemesrw ryan niemes

      Hey Brandon – thanks for the feedback. You’re right, the NB API is a powerful feature, it’s just not something I see as a use case in my current network.. which is why I didn’t write about it here.

      As for security, that’s a good point. I think that my vision around OF w/resp to security would involve an OF “firewall” of sorts, where the security policy is defined within the controller, and any ‘new’ flows (since they could be punted to the OF controller) would be checked against this policy. The same care & feeding would be necessary as with a normal firewall… and you would still need application inspection. But imagine – could OF replace your need for a border firewall?

      • http://twitter.com/nkrypted Brandon Mangold

        Man, we are on the same wavelength. One of my many soapbox topics is “security is foundational”. I preach a lot within the enterprise I work for about end-to-end security and building the network as a whole to a policy-enforcement mechanism.

        I don’t know about replacing the border ‘firewall’ just because I believe you would still need an aggregated traffic inspection engine (IE FW, NGFW, IDS/IPS) but I would certainly love to enhance the ability of the network to be another defense mechanism that can aide in detection of and response to threats.

        • http://twitter.com/niemesrw ryan niemes

          Same here man.. security needs to be layered in order to be effective, OF could be an augmentation rather than a replacement. You would still need something to inspect at > l4 like you suggest.

  • PeterPhaal

    Interesting article. The use cases described primarily focus on static optimizations. Using OpenFlow to perform real-time traffic engineering and optimization driven by measurement also has promise.

    The following article describes how OpenFlow can be used to optimize load balancing of long lived flows in LAG/ECMP groups:

    http://www.bradreese.com/blog/1-13-2013.htm

7ads6x98y