My team and I recently faced a situation with our iSCSI SANand how layer 2 flow control can cause performance issues with an IP storage environment. This problem is not unique to iSCSI and will affect NFS as well.
My Database is Running Slow
The issue all started with what appeared to be a smooth firmware upgrade of our Equallogic SAN. The upgrade was performed during a Sunday morning maintenance window.
Monday morning the database team started to see disk busy alarms in their monitoring tool. After investigating, it was very apparent that the SAN was not performing at the same level it was prior to the upgrade. Queue depth had increased from 4 to 40 and read latency had increased from 15ms to 235ms.
We used our first tool of troubleshooting: what changed? The SAN was clearly the suspect in our investigation. We created a case with our vendor and sent the diagnostic logs and monitoring data. This being an IP based storage solution they also requested the “show tech” output from our Nexus 5000s used for our SAN.
The vendor examined the data and came back to us with their diagnosis. The issue was a large amount of pause frames being sent by two VMware ESX servers. All ESX servers in our environment use 10GB Ethernet; we have a standard network card for all but two servers, the ones causing the problem. Our standard Ethernet adapter is the Qlogic QLE8042, the cards issuing the pause frames are Intel XF SR 10GB cards.
The two hosts with the Intel cards were issuing over 300 pause frames a second. The other hosts in our datacenter had issued none over the prior 24-hour period.
How Pause frames affect IP Storage
The SAN was receiving a huge number of pause frames from the two hosts. This caused the SAN to stop sending data to hosts until the pause frame expired. This increased both the queue depth and read latency dramatically; write latency did not change. Most all IP storage vendors recommend flow control. They typically recommend that switches be set to receive and hosts and arrays be set to send. This makes sense if everything is within normal ranges. If an array is sending data to a host faster than a host can process the data the host will issue a pause frame. The pause frame causes the array to pause sending data for a very short period allowing the host to catch up. Simply put, flow control prevents a fast host from overwhelming a slow host. Flow control is more effective than TCP for controlling congestion in high bandwidth, low latency storage environments. TCP will drop packets if congestion becomes an issue. Data will not be lost but TCP retransmits are expensive and inefficient.
Flow controls falls apart when a host sends a massive amount of pause frames. All hosts requesting data from the array will also be paused.
Cisco switches provide a counter for flow control frames. Below is the output of a troubled link. Note the counter for RX pause. This is zero on all other ports
without the issue.
Ethernet1/8 is up
Hardware is 1000/10000 Ethernet, address is
000d.ecce.264e (bia 000d.ecce.264e)
Description: ESX08, NIC-VMNic3
full-duplex, 10 Gb/s, media type is 10g
Input flow-control is on, output flow-control
is off
20616031 Rx pause 0 Tx pause 0 reset
Most network management tools will provide the ability to monitor this counter via SNMP polling. I would recommend monitoring on a rate of change instead of a raw count.
Workaround
To stop the problem in the short term, you must disable flow control on the switch ports of the hosts with bad cards. On the Cisco Nexus platform the command for this is:
no flowcontrol receive on
This will stop the pause frames from being processed by the switch and being forwarded to the array. You will still see the counters for the pause frames increasing.
We also upgraded the firmware and the drivers for the cards but it did not solve the problem.
Permanent Solution
We replaced the faulty cards as the permanent solution. Equallogic will also be working on providing a detailed explanation of why this issue was not causing any impact with our older SAN firmware.
Recommendations
- If you run an IP storage network, understand what flow control is and why your vendor wants you to turn it on.
- Monitor your switches that have flow control enabled for pause frames.
- Ensure that you collect performance data from your SAN so you have a baseline to understand what “Normal” is.
- Collect all log data from your SAN, hosts, and network to a centralized syslog server like Splunk to assist in problem identification and event correlation. It can make a tremendous difference in troubleshooting.
Further reading
Boche.net
http://www.boche.net/blog/index.php/2010/11/29/flow-control/
Tecnologico De Monteray