bufferbloat for the impatient

After wandering around for a bit, I’ve settled back in San Francisco on a more or less permanent basis. Part of the moving process was finding an ISP and it seems like Comcast is the best option (for my situation). I signed up for their standard residential service, and remote teleworking continued on quite merrily… except for one tiny wart.
Stores don’t always advertise their extra online deals, such as free shipping or 20 percent off your entire purchase, but coupon services have these codes. Even 10 percent off your purchase is worth the extra five minutes it will take you to search

We use Google Plus hangouts quite extensively on my team including a daily standup with attendance that hovers between 5 to 10 people. The first time I tried a hangout with my new Comcast service, it was unusable with extreme lag everywhere, connection timeouts, and general unhappiness.

I had a strong hunch that I was suffering from bufferbloat, and a quick ping test confirmed it (more on that later). Obviously I wanted to fix the problem, but there is a lot of text to digest for someone that just wants to make the problem go away.

After a bit of irc whingeing and generous help from people smarter than me, here are my bufferbloat notes for the impatient.

background
Bufferbloat is a complex topic, go read the wiki page for excruciating detail.

But the basic conceptual outline is:

  • a too large buffer on your upstream may cause latency for sensitive applications like video chat
  • you must manage your upstream bandwidth to reduce latency (which typically means you intentionally reduce upstream bandwidth)
  • use QoS in your router to globally reduce upstream bandwidth (not for traffic shaping!)

diagnosis
Ensure your internet connection is idle. Then, start pinging google.com. Observe the “time” field, which will give you a value in ms. Watch this long enough to get an intuitive feel for what is a normal amount of latency on your link. For me, it hovered consistently around 20ms, with some intermittent spikes. You don’t need to be exact. If the values swing wildly, then you’ve got other problems that need to be fixed first. Stop reading this blog and call your ISP.

While the ping is running, visit http://testmy.net/upload and kick off a large upload, say 15MB or more.

If your ping times increase by an order of magnitude and stay there (like mine did to around 300ms), then you have bufferbloat.

This isn’t as rigorous as setting up smokeping and making pretty graphs, but trust me, it’s a lot faster and way easier. Thanks to Alex Williamson for this tip.

mitigation
You will need a router that can do QoS.

The easiest solution is to spend $100 and buy a Netgear WNDR3700 which is capable of running CeroWRT. Get that going and presumably you’re done, although I can’t verify it since I am el cheapo.

I didn’t want to spend $100 and I had an old Linksys WRT54GL lying around. Install Tomato onto it. (Big thanks to Paul Bame for helping me (remotely!!) recover a semi-bricked router.) Now it’s time to tune QoS.

In the Tomato admin interface, navigate to QoS => Basic Settings. Check the “Enable QoS” box and for the “Default class” dropdown list, change it to “highest”.

Figure out your maximum upload speed. You should be able to obtain this number after a few upload tests at testmy.net that you did in the previous step. Enter your max upload speed into the “Outbound Rate / Limit” => “Max Bandwidth” field. Make sure you use the right units, kbits/s please!

Finally, in the “Highest” QoS setting under Outbound, set your lower and upper bounds. I started with 50% as a lower bound and 60% as an upper bound.

Put a large fake number in for “Inbound Limit” and change all the settings there to “None”. These settings don’t seem to affect latency.

Click “save” at the bottom of the page — you do not need to reboot your router.

Re-run the google.com ping test + large upload test at testmy.net. Your ping times under load should remain relatively unchanged vs. an idle line. Congrats, you’ve solved your bufferbloat problem to 80%.

Update (7/29/2012): Thanks to John Taggart for pointing out a more rigorous page on QoS tuning for tomato.

Now you can experiment with increasing the lower and upper bounds of your QoS settings to get more upstream bandwidth. As always, make a change, save, re-run the ping + upload test, and check the results. Remember, the goal is to keep latency under load about equal to what it is on an idle line.

Now your colleagues will thank you for the increased smoothness of your video chats, although remembering to brush your teeth and put pants on is the “last mile” problem I can’t solve for you.