Whether coming from heavily regulated industries or just enterprises that have kept up with advances in network security over the last 10 years, an engineer will have a certain level of expectation when it comes to the maturity of security tools in an environment. So when a network or security specialist from one of those industries first feasts their eyes on the native security offering of the public clouds, the response is usually “where’s the rest of it?”
Don’t get me wrong, there is certainly a case to be made for running native only tools (some of the largest cloud deployments have no problem doing it). After all they’re called 3rd party solutions for a reason, since deploying them adds complexity and poses scalability challenges that their native counterparts just don’t suffer from. However, considering most enterprises have baggage that they take with them to the cloud and their legacy applications don’t come with “built-in security”, looking at how to factor a 3rd party security solution into your cloud network design is a pretty common exercise these days.
In terms of deploying a modern internet facing WAF solution or Layer 7 firewall, the options are limited with how creative you can get (more limited than in your on-premises datacenter). The larger design considerations will be defining the scalability of the solution and more importantly how it handles high availability. We’re going to focus on one particular design that’s become more and more prominent as of late, named the “load balancer sandwich”.
I know what you’re thinking…because it’s the same exact thing I thought of when I first saw this solution. From the design school of keeping topologies as simple as possible, what hackathon did this originate from? Can you imagine doing something like this in your on-premises datacenter? Let’s dive into the origins of this design and why I’ve come around on it.
First and foremost, as already discussed, lets agree on the fact that no 3rd party solution will be as simple and scalable as its native counterpart. With that out of the way, if you want your solution to be fault tolerant you have 2 options. The first is the “traditional” way of doing HA, where you deploy a pair of appliances that back each other up and synchronize state to one another. One gotcha with this approach is that there is no concept of virtual IPs (ie, VRRP) which multiple devices can respond on behalf of, hence you’re stuck with moving virtual NICs from one firewall to another via API calls, during a failure scenario. These API calls can often take a long time and more importantly produce inconsistent results, not to mention forcing you to run an idle instance at all times (not very cloudy). The second pain point with this HA design is that appliance pairs cannot span availability zones in the cloud, hence you’re forced to deploy a pair per availability zone (did I mention that these 3rd party appliances are expensive?). Finally, the last drawback with doing HA in this fashion is that you’re reliant on one VM’s capacity to funnel all of your internet traffic through, so you could see how that fails to scale really quickly.
This brings us to the aforementioned load balancer sandwich design. Compared to our first option, this one takes away the headache of relying on API calls that may or may not work within a given period of time. While relying on a load balancer to determine the health of your appliance feels dirty, keep in mind that these are not the same beefy centralized load balancers you deploy in your datacenter. These are highly distributed systems that allow you to spin up a separate instance for the sole purpose of forwarding traffic to your appliances, without being too concerned about the amount of bandwidth you’re pushing through them. Chances are, their uptime will exceed that of any other individual solution deployed on a VM. The need to deploy a pair of appliances per availability zone is also gone, since you now just need to have one appliance in each zone for starters.
From a scalability standpoint, you now have the flexibility to dynamically add more appliance instances to increase your capacity. Your failure domain arguably decreases due to the fact that you’re not relying on only one “master” instance at any given time. If your mindset is that “components will definitely fail, only a matter of when”, testing how those components fail early and often is crucial.
The lower end of the sandwich is a separate load balancing instance(s), that is meant to be controlled by any team independent of the 3rd party solution. The quick takeaway here is that the application team can still continue to operate their pipeline in an efficient manner without a change in process.
Stepping off the unicorn for a second, there is definitely some duct tape involved in bringing this solution together. Yes NAT is involved, yes there is a reliance on Lambda functions to account for various pieces when appliances are scaled up/down, and so on. However, its fair to argue that the way appliances do traditional HA is just as complicated, but the complexity is hidden from you by the vendor. If this is the cloud friendly approach to deploying 3rd party security appliances going forward, then it definitely makes a solid attempt at playing to the strengths of the cloud (micro services, auto scaling, automated deployments).