Service chaining has been getting a lot of press — and I’m encountering it a lot in the customers I’m talking to. What’s the big deal? To understand service chaining, let’s look at a really simple example, illustrated below (as always, you can click on the image to get a bigger version).
In the upper part of this image, a host (in this case a tablet, just for illustration), transmits a packet through the network towards some server past D (on the right side of the diagram). This packet is carried through some edge network (LTE, fiber, etc.) and into the edge router at the provider, marked as router B in the diagram. The provider has some set of policies about specific operations that must take place on the packet (or flow) before it can be forwarded through D towards the ultimate destination. To implement these policies, the provider puts an appliance in the path of the packet, such as a firewall — C in the diagram. This is all pretty simple — it’s fairly standard stuff we see in network design on an every day basis.
But let’s back up a second and consider what that appliance really does. A firewall actually provides a set of services, such as network address translation, deep packet inspection, and access control. What the network operator pays for is the hardware and software that performs each of these actions at wire speed — a complicated task. But why should these services be bundled into a single appliance in this way? Is there some “magic rule of networking,” that says all the policies implemented in a flow must be contained on a single box, or even in a set of appliances? No, there’s not.
Let’s split these individual processes up, then, and put them on generic hardware. After all, we have a data center fabric sitting someplace with a ton of compute and storage; it’s easy enough to spin a VM that can perform CGNAT, for instance. In fact, developing and deploying a set of applications that each, individually, perform one of these three functions — CGNAT, DPI, and AC — onto generic compute resources on a data center fabric is a pretty simple concept. Now if traffic that needs to be DPI’d increases, we can scale out (rather than scaling up) by just spinning a few more VMs.
If you’re paying attention, you’ll quickly see the fly in the ointment here… These processes aren’t in the path of the traffic flow I’m trying to apply the services to. How can I solve this? I can’t readily push each of these services into the shortest path between A and the destination, out between D. If I can’t bring the services to the packets, why not bring the packets to the services?
What if I could, at A, determine the set of services through which this particular flow of packets must pass on its way towards the destination, past D? What if I could instruct A to stack as set of MPLS labels, for instance, onto the packet as it’s switched through A, so that it will be sent to each of the various service VMs in turn as it passes through the data center fabric, and before being passed to D to continue it’s journey to the actual destination?
This is precisely what service chaining actually does — in the case of the MPLS fabric from my prior post, you could stack a set of labels onto the packet at the network ingress (at the data center border router, or the first leaf node the packet encounters, or even the edge of the network itself). The first label will actually cause the network to forward the traffic to the first service in the chain, which then pops its label and forwards the packet along. The second label causes the packet to be sent to the second service, which then pops the second label and forwards the packet along.
This is a simple, but powerful, idea. It allows us to virtualize services into a generic (and well understood) data center fabric, running the service on standard compute and storage. It allows us to insert and remove services easily, just by spinning up VMs and pushing the right stack of labels at the network entrance. Finally, it allows us to scale services by spinning VMs, and using scale out rather than scale up principles.
Which leads the obvious question: what’s the cost? According to network complexity theory, there’s no such thing as a free lunch — all decisions are tradeoffs, rather than absolute. What’s the tradeoff here?
The most obvious one is the increased stretch through the network. Rather than passing through the network core (or data center fabric) once, packets have to find their way across the fabric once per service. If the cost of the network is cheaper than the cost of appliances plus the cost of rolling a truck (or installing new hardware) to roll out a new service, or add/remove a specific policy point for any given user, then service virtualization and chaining is a clear winner. If the cost of the network is high, however…
Other tradeoffs to consider are the additional latency and delay through the network, the actual cost of generic hardware (if it takes ten blade servers to replicate a single appliance, is there a real gain?), and the additional complexity of (essentially) traffic engineering per flow. Which way the tradeoffs fall out will all depend…
Remember the age old question of highest importance in the world of network engineering: How many balloons fit in a bag?