This guest blog post is by Kevin Deierling, VP Marketing at Mellanox. We thank Mellanox for being a sponsor.
A recently published Tolly Report demonstrates that 25, 50, and 100 Gb/s Ethernet switches based on Mellanox Spectrum deliver predictable performance and Zero Packet Loss. By contrast, Broadcom Tomahawk-based switches showed fundamental weaknesses in three areas:
- Fairness: Port Dependent Bandwidth Allocation
- Packet Loss: Full wire speed switching performance for all packets sizes without dropping packets
- Microburst Resilience: Ability to tolerate temporary incast conditions without dropping packets
The report is quite interesting and worth reading. However, some may wonder whether these low-level problems actually matter at the application level. We decided to find out for ourselves – and it was pretty easy to demonstrate that low-level network performance issues definitely do affect application performance.
To judge application fairness for yourself you can watch these videos which show clients transferring three files to a server. These videos clearly show that the Spectrum Ethernet switch delivers application fairness, while the Broadcom Tomahawk based switch does not:
To measure application performance we used a simple setup: five client machines and one file server (as shown in Figure 1). In the test, a file is transferred from each of the five client machines to the file server. This is a common occurrence in the real world; for example, when periodic backups occur or when a distributed application accesses a central data store.
We focused on just one of the issues highlighted in the Tolly Report as a performance problem for Tomahawk-based switches: port-dependent fairness. Here the Tolly report looked at the port dependent bandwidth allocation in cases of many-to-one data transfers, to see whether the bandwidth allocated to the participating ports was fair.
At a high level, the results showed that the Tomahawk-based switch showed extremely unfair bandwidth allocation, while the Spectrum-based switch allocated bandwidth fairly regardless of which ports were used. These results are summarized in the graphs shown in Figure 2.
It’s worth noting that the Tomahawk switch behavior is highly dependent on exactly which ports are chosen. If you carefully choose the “right” ports and the “right number” of ports, you can mask the problem and make the switch look better.
The problem with this approach is that in general you have no idea which are the ‘right’ ports. Even if you could figure out which ports were the right ones, it is virtually impossible to consider specific ports when scheduling workloads.
This is particularly true in a virtualized environment, when the virtual machine associated with a workload can migrate to a different physical machine on the fly.
But the question remains whether this low-level result actually manifests itself at the application level.
To answer this question we chose a very simple application: file copy from a centralized server to client machines. This is of course an extremely basic test – but in fact representative of real-world applications which invariably have data files at the core of their operation. Even advanced distributed systems like databases, Hadoop big data analytic platforms, and object-based systems have at their heart a ‘file-like’ shared data repository.
Application Performance Test Results
In an ideal situation, each of the clients would receive a fair (meaning equal) portion of the overall available file system performance. As can be seen in the video of our test there is one client that receives half of the available backup performance, while each of the other four clients must share the remaining available bandwidth.
Looking at a screenshot of the tests, the file transfer performance is extremely unfair when using the Tomahawk-based switch. One client’s file transfer is nearly finished (94% complete) while the other four clients are only 22%-26% complete.
The actual measured allocation of bandwidth for the file transfer is summarized in Figure 4. As can be seen, the Tomahawk exhibits extreme unfairness, with one client receiving half of the available bandwidth and the other four being forced to share the remaining file transfer bandwidth.
By contrast the Spectrum-based switches allocates the file transfer bandwidth fairly to each of the five client machines.
The issue of fairness is problematic for cloud and service providers trying to offer service level guarantees, because the lack of fairness is dependent to which port a particular customer is connected.
This is even more problematic in a virtualized environment where the VM associated with a particular customer can migrate to another machine – resulting in completely different physical port connectivity and thereby radically changing the customer user experience. Clearly, fair and predictable application performance is important in multi-tenant situations.
This testing has clearly shown that the low-level fairness issue does indeed impact higher level application performance. File access and networked data transfer underlies virtually all higher-level applications. Further study is needed to show how other applications such as analytics and transaction processing will be impacted by both the fairness issue and the other low-level problems of Tomahawk-based Ethernet switches.
Contact Mellanox today to get a Spectrum Ethernet switch for a proof of concept evaluation. A very simple test can demonstrate whether your Ethernet switch is impacting your application performance.