Most large deployments within AWS span multiple regions.
This can be done for added resilience or to be located physically near the consumers of your AWS resources.
While interconnecting different virtual private clouds (VPCs) in one region is very simple, AWS doesn’t provide any good tools for connecting an AWS VPC in different regions. As it is often times with AWS, simply interconnecting an AWS VPC in different regions is not that hard, but it gets much harder if you want to achieve redundant connectivity between your VPCs.
Options for interconnecting AWS regions
There are 2 main options of managing connectivity between VPCs in different regions: through VPNs over the internet or through a private link called Direct Connect. The latter requires you to place your equipment in data centers where you can connect to AWS direct connect or lease private lines to these data centers. Also, you have to have your own private line to communicate between those data centers. While providing low latency and high bandwidth, this option tends to be very expensive. Thus, it is not always desirable for business.
In this post, I’m going to concentrate on the VPN options.
AWS, in its Whitepaper, outlines the following options for VPN VPC interconnections:
- Software VPN – in this case, you are managing EC2 instances by yourself, running VPN software on top of them. You can use Linux with OpenSwan, or you could use specialized firewall image (pfSense, Checkpoint and many other options). This option gives the greatest flexibility, but is also the most complex, as you have to manage high availability yourself.
- Hardware VPN – in this case, you are using AWS virtual private gateway (VGW) to provide connectivity from the VPCs to your hardware VPN appliance in your data center. HA on AWS side is being handled by Amazon, so you don’t have to worry about it. You do, however, need to worry about HA setup of your VPN appliance. This option supports dynamic routing via BGP.
- Hardware to software VPN – in this case, you are using AWS VGW in one of your VPCs and connect it to the software VPN appliance in another VPC. You still have to maintain HA for the software VPN appliance.
Note: Currently, AWS does not support VGW to VGW connectivity. Once/if that feature is implemented, interconnecting VPCs in different regions will become a trivial task.
I am not going into details of the setup for each of those options, as AWS and lots of other sites have wonderful detailed write-ups for bringing up these tunnels. For example:
- Connecting Multiple VPCs with EC2 Instances
- How to configure IPsec VPN tunnel between Check Point Security Gateway and Amazon Web Services VPC using static routes
Challenges with setting up HA connectivity within AWS
Before talking about HA in AWS, it’s important to refresh what AWS guarantees, or most important in our case, what AWS does NOT guarantee. Specifically, AWS does NOT guarantee any uptime for any specific instance and also AWS does not guarantee that any availability zone (AZ) will remain up. AWS, however, does guarantee that no more than one AZ will be down in the region at any given time. This means that any kind of HA setup has to be spread across at least 2 AZs.
Example of HA connectivity challenge
Let’s take the following diagram as an example:
On the diagram there are 2 VPCs with 2 AZs in each. The routing table for each AZ is very simple: for the local VPC, go to the local target (that route is in every routing table in AWS by default) and for the remote VPC go to the VPN instance.
Now imagine that VPN Instance 1 goes down.
In this case, subnet 10.0.0.0/24 loses connectivity to another AWS VPC. Also, subnet 172.16.0.0/24 loses connectivity to another VPC. In order to regain connectivity, you have to change the routing tables towards remote VPC for both subnet 10.0.0.0/24 (point it to VPN2) and for subnet 172.16.0.0/24 (point it to VPN4).
That’s where the VPN Monitor comes in. A script running in each AZ, ensuring that connectivity through the active tunnel is up and switching the routing table if it is not up.
Not too bad, is it?
Well, that’s not it. You’d also want to ensure that your routing tables go back to normal once the failure is cleared. But, the only system in AZ that has direct access to test whether the tunnel is up and passing traffic (by sending some data over it and receiving response, for example ping would do just fine) is the VPN instance itself. The rest of the instances are using your AWS routing table that you have just modified to avoid the routing issue.
However the initial problem was that one of the VPN instances went down. So, on top of running the script on the VPN instance that monitors the tunnel, we need another script that is monitoring the VPN instance itself and switches routing tables if the instance is unresponsive. Oh, and that instance may have a different IP once it comes up, so you have to make sure DNS is configured properly.
Add that to the fact that you may have interconnection between not just 2 VPCs, but 20 and you will see why relying on scripts to modify routing tables can become complicated quite fast.
An easier way to set up redundant connectivity for an AWS VPC
All the problems in the previous example came from the fact that we needed to manipulate routing tables via scripts to maintain connectivity in the event of failure. But, that’s exactly what dynamic routing protocols were invented for!
Currently, there’s no support within VPC for the dynamic routing, however, there is support for dynamic routing protocol BGP in the VGW.
Here’s the recommended HA setup for redundant VPN connectivity from AWS:
In this particular example, there are 2 VPN appliances on premise that establish VPN tunnels with each AWS tunnel endpoint, and share their routing table via BGP. Redundancy within AWS is no longer your concern, as there’s only a single route pointing to VGW and AWS that will ensure HA is up.
Redundancy in your physical network is handled by installing 2 separate devices in geographically diverse locations. And if any single device were to go down, BGP will fail all the traffic to the active link. And you don’t actually have to have physical VPN appliances on premise. You can just as easily create them in another AWS VPC.
After that, you can connect as many VPCs to the pair of your appliances, resulting in the following connectivity (note that connection to your corporate data center is optional).
There are no theoretical limits of how many VPCs you can interconnect in such a fashion, but practically, you are very likely to hit bandwidth limitations of your VPC appliance. However, if you are starting to push that much data (a modern server can handle 10Gbps VPN encryption fairly easily), you are probably better off getting a Direct Connect.
There’s no right or wrong way for interconnecting VPCs across different regions, but it’s definitely easier to do so in a redundant way by utilizing VGW and BGP. It is important to note that while this method provides more robust and resilient connectivity, it is more costly.
In the case of on-premise equipment, you’d have to pay for the security appliance and the internet connectivity to it. In the case of the virtual appliance, you’ll not only have to pay for the appliance CPU cycles, but also to pay double for your network bandwidth between VPCs (as you will get charged for traffic leaving your originating VPC and then also for the same traffic leaving transit VPC).
Whether these benefits are worth the money is best answered in the traditional networking way: “it depends”.
This article was originally published on the Datapath.io Blog