What is bufferbloat? Bufferbloat describes a buffer that’s so big as to be impractical. Why? After a certain amount of time, delivering a packet is pointless. Two examples come immediately to mind.
- A UDP packet for voice traffic is not useful if delivered in an untimely fashion. If the packet arrives hundreds or thousands of milliseconds late, it’s of no value. Therefore, deep buffers are not helpful in this situation.
- A TCP packet of any sort will be retransmitted by the sender if the receiver does not acknowledge receipt in a timely fashion. Overly deep buffers that hold a packet after the TCP acknowledgement timer has expired exacerbates the congestion problem. Instead of the packet traversing the congested link once, it’s now traversing the link twice: once after emerging from the deep buffer, and a second time due to the sender’s retransmission. Yuck.
Bufferbloat can occur at any point there is network congestion and a very large buffer configured. Service providers might configure large buffers in their networks to help with SLA compliance. Assuming the SLA is measuring what I would describe as “the wrong things,” packet delivery might take precedence over latency. In that case, an SP might buffer excessively to insure delivery, ignoring the inherent latency introduced by such a scheme.
But, the greatest bufferbloat concerns aren’t deep inside SP networks. Indeed, there are legitimate reasons for a service provider to buffer traffic in their cores — to handle bursts on a typically uncongested link, for instance.
Bufferbloat is felt most painfully in residential routers, where the connection from a fast home WiFi or gigabit Ethernet network meets a cable modem or DSL router with a much slower link to the ISP. Traffic inbound to the residential modem from the faster network can easily overwhelm the slower upstream link.
Large buffers that hold on to packets for hundreds or thousands of milliseconds during moments of congestion cause problems for applications like Skype, gaming, or DNS lookups that are (directly or indirectly) impacted by latency.
The bufferbloat problem is one of queue management — how to decide what gets buffered, for how long, and de-queued in what order. Enterprise network engineers would answer that question with traditional QoS tools like traffic classification, a variety of token bucket algorithms, and tail-drops.
However, Rich Brown, our guest on today’s podcast, points out that standard QoS techniques don’t necessarily resolve the bufferbloat problem. For instance, tail-drop can be useful but doesn’t resolve the problem of a bloated queue. And having a best-effort traffic queue isn’t a wonderful thing for all the traffic actually lumped together into that best-effort queue. Even for best-effort traffic, there’s a user experience to be considered.
The Podcast
Rich Brown chats with Ethan Banks about CoDel, an algorithm specifically designed to minimize the impact of bufferbloat. CoDel is not new. In fact, it’s been baked into the Linux kernel since May 2012. However, CoDel has not impacted the industry significantly as yet. Residential modem manufacturers deliver their products to market on close margins, and upgrading to more modern Linux kernels aren’t high on their list.
So, for the most part, fixing bufferbloat becomes an end user concern. One solution is OpenWrt, which supports CoDel for those consumers willing to use OpenWrt to replace the OS shipped by the vendor.
Rich and Ethan discuss CoDel in quite a bit more detail, explaining how it works, the head-drop principle, sojourn times, TCP ECN, and more. This is a nerdy look at how your modem handles buffering, and how you make your home networking experience better.
The CoDel Demonstration
In this show, Rich demonstrates in real-time how CoDel helps bufferbloat. Using his Skype voice session for us to listen to, he shows how voice quality suffers with a bloated buffer by saturating the link with netperf. Then he enables CoDel on his connection, and the quality improves dramatically. Note – it’s one of the best real-time demonstrations of technology in action I’ve ever heard.
Worth a Visit
Our thanks to Rich Brown for supplying us with many of these links.
- DSL Reports Speed Test
- CeroWrt
- OpenWrt
- SQM HowTo
- IETF Draft – FlowQueue CoDel (look at page 4 for an informal summary of the fq_codel algorithm)
- Rich Brown’s Blog – Random Neurons Firing
- Cisco PIE vs. fq_codel
- Bufferbloat and ISP speed test results (dslreports.com)
Everyone does speed tests perhaps even your lab. A lot of lab switches can do rate limiting. Want to see this first hand? Make a bandwidth cap of your own. Iperf works in your lab. Try speedtest.net if your ISP has upgraded from barb wire.
The old Catalyst 3550 can’t do a 5Mb speed test to save its life. The Cat 3560 and 3750 do this based on the link speed of the interface. If that link comes up at 10Mb half your 10% limit won’t work so well.
Of course if you force the link at 100Mb full you trade one headache for another. But it had less complaints so lets keep forcing 🙂
The newer Cisco ASIC’s are sometimes fabulous. The bigiron like 6500 have some you wouldn’t mind using yourself.
Yeah some of these have nice large queues. Hello buffer bloat. Most of this is just trying to a constant speed w/out having to fire up bittorrent. But I am giving Cisco a hard time for something outside their core.
That DSLAM or similar fairs much better. Their key is simplicity. The whole chassie might have a CPU similar to your home router. Those old clunker’s could never queue up that many packets.
I do hope that ISP’s will start asking their sales reps about this. I gotta assume that changing the bandwidth cap buffer would be more effective than router next to the cap. Considering CoDel works on time both devices would send similar signals to the hosts.
After hearing this I will do some playing with my current openwrt router. Upgraded from a PIX last week and need to futz around with it.
Hello Ethan and thanks for a great show
Several posts onthe Ubigqiti forum seems to indicate that at least the current beta has fq-CoDelso they are working on it
If anyone is up for it, please add fq_CoDel to FreeBSD. And eventually CAKE. This stuff is magic.
Hi, there is some work to add codel, PIE, fq_codel and fq_PIE to FreeBSD
http://comments.gmane.org/gmane.os.freebsd.devel.net/47124
Great show both!
I remember using DDWRT with an iproute script to try and stop the wife’s downloads causing me to get shot in Counterstrike. Wasn’t hugely effective and not exactly user-friendly either so it’s good to see a nice straightforward solution is now available, will have to do some reading into codel.
Haven’t played any games for years now and not bothered with anything similar, will probably be my kids moaning soon, your son will definitely thank you for implementing something similar at home!
Hi Ethan,
I have a home router setup with DD-WRT. i tested with ‘queueing discipline’ fq_codel and sfq (the packet scheduler being HTB for both)
i see that fq_codel gives best performances for drop sensitive traffic like VoIP or any uplods in general, but some how it seems to hamper download speeds and latency.
So my default setup is to have ‘queueing discipline’ set as sfq. And when i have a scheduled VoIP/Video call i change it to fq_codel. Best of both the use cases 🙂
I don’t have a router to test with at the moment. But…I’m wondering if the download/latency impact is tied to TCP receive acknowledgments getting head dropped due to too long of a sojourn time when the link is under a load. I never thought to ask Rich if the algorithm looked at TCP ACKs uniquely or not. Then again, you’re saying that just turning on fq_codel is enough to see the issue? Not that the link is stressed?
What you recommend for gaming?
You can give fq_codel a try if your router supports it. FQ_CoDel can’t make the Internet itself faster, but it should help get latency-sensitive gaming packets out of your home router and onto the Internet more quickly.
That assumes you’ve got upstream congestion, like if you’re seeding on Bittorrent and a lot of people are leeching from you. If you’ve got no congestion, no QoS scheme will make anything faster, because you’re already going as fast as possible. Experiment. Can’t hurt to try things out and see what works.