This guest post is by ThousandEyes. We thank ThousandEyes for being a sponsor.
Until recently, network designers had limited choices when it came to how their traffic traversed the networks of the Big 3 public cloud providers: AWS, Azure, and Google.
Depending on the provider, a customer’s traffic would either be quickly absorbed onto the provider’s backbone, or sent across the public Internet until it reached a provider’s PoP near the destination.
As you might guess, the providers’ architectures could affect performance because the public Internet is a best-effort medium, while their private backbones are highly optimized to deliver packets.
The performance differences can be stark. According to data reported in ThousandEyes’ 2018 Public Cloud Performance Benchmark Report, latency variations between cloud providers could range from approximately 2x to 5x in particular regions.
For instance, latency variations for AWS in parts of Asia topped 140 ms compared to approximately 70 ms for Google Cloud and under 30 ms for Azure. Such variations in predictability can make it difficult for engineers and designers when provisioning connectivity for applications, particularly latency-sensitive apps such as voice and video.
In investigating these discrepancies, ThousandEyes determined the issue came down to the cloud providers’ design decisions.
Azure and Google configured their network to get customer traffic onto their backbones as close to the source as possible, so that customer traffic was carried on their private networks for much of the journey.
By contrast, AWS kept traffic on the public Internet as long as possible. Once traffic neared its destination, AWS would bring it onto its private backbone.
For example, using agents to track network paths, ThousandEyes showed that GCP traffic originating in London and destined for Virginia in the U.S. would enter a London PoP and ride Google’s own network to a Google data center in Ashburn, VA.
Azure’s network design was similar; traffic originating in Milan, Italy and destined for Virginia would hit a Microsoft PoP in Milan and travel Microsoft’s backbone to a data center in Richmond, VA.
By contrast, as seen in the figure below, AWS traffic originating in Milan would traverse the public Internet in Europe and the United States before entering Amazon’s backbone in Virginia.
ThousandEyes’ benchmark report posits this as the primary reason for the performance differences among AWS, Azure, and Google.
New Performance Options
As mentioned earlier, designers had little choice about how cloud providers handled their traffic.
That’s changing. Google and AWS now let customers find a balance between cost and performance by offering network options that take advantage of the providers’ private backbones.
Or, to look at it from another perspective, Google and AWS are monetizing their private fiber in hopes of enticing customers to spend more for better performance.
Google’s offering, currently in beta, is called Network Service Tiers.
AWS’s service, which was announced at its re:Invent conference in November 2018, is called Global Accelerator.
At present, Azure hasn’t announced any for-pay services that leverage its private backbone, but we’ll see if that changes in 2019.
AWS and Google’s new services give engineers and executives more choice about price and performance. The catch, of course, is cost. You’ll pay for the privilege of having your packets ride inside a luxury network, neatly segmented from the jostling crowds commuting via public transit.
What You Get
Google’s Network Service Tier options are pretty straightforward: Premium and Standard.
Premium puts your traffic on Google’s private backbone as far along the path as possible. Your traffic exits a Google PoP to an ISP that’s closest to the destination. Google says it has more than 100 PoPs worldwide. GCP is spread across 17 regions and 52 Availability Zones (AZs) globally.
In addition to riding Google’s fiber, the Premium Tier offers global load balancing (that is, across multiple regions) and a global SLA.
The Standard tier relies more heavily on the public Internet to route traffic. It only supports regional load balancing, and doesn’t offer a global SLA.
Google prices Standard and Premium based on the number of Gbytes and geographical source and destination of the traffic.
Amazon’s Global Accelerator service is typically provisioned per application. When you provision Global Accelerator, AWS sets up two static anycast IP addresses for the application across its edge locations, so that end user traffic bound for the application should reach AWS’s closest geographical infrastructure.
Traffic is carried across AWS’s backbone to the optimal end point group in an AWS Region. An end point group can include multiple AWS network load balancers and application load balancers.
AWS says the Global Accelerator service monitors these load balancers, as well as elastic IP services, to ensure that application traffic reaches the healthiest or best-performing end point group within a region.
As mentioned, AWS prices the Global Accelerator on a per-application basis. Customers are charged a fixed hourly fee, as well as a fee for data transfer. The data transfer fee is also applied hourly and is based on the dominant direction of traffic, either inbound or outbound.
Is It Worth It?
Is the extra cost worth a higher level of performance? Are there regions where a private backbone is simply a better option than the local public infrastructure?
Most importantly, how do you determine if the extra money you’ll pay is really worth it?
The answer is “It depends.” In some regions, and for some application types, the public Internet may be just fine. However, certain applications such as voice and video, or business-critical applications, may benefit from the cloud provider’s private network.
Before you can decide, you need to have clear visibility into existing network and application performance. You also need a good understanding of how performance differs among cloud providers, including by geographical region.
In other words, you need solid metrics to help inform your design choices now. You also need ongoing measurements to make necessary adjustments as your business needs change, and as the infrastructure underlying the public Internet and the cloud providers evolves.
ThousandEyes’ 2018 Public Cloud Performance Benchmark Report is a good place to start. This report compares overall and regional network performance of the Big 3 cloud providers, including average, normative, and best-in-class performance.
The report provides a useful glimpse into otherwise opaque and complex systems, and can help inform your planning. Subsequent annual reports will track historical trends as well as provide a current snapshot of cloud performance.
As mentioned, ongoing measurements are also key. ThousandEyes is well positioned to provide these measurements thanks to software agents across the globe that capture and report on critical network and application performance metrics in real time.
To learn more about how to monitor the Internet to deliver a reliable digital experience, click here.