Monday, July 11, 2011

TCP: Conflicting Goals

David Singleton writes in "Why mobile apps suck when you're mobile (TCP over 3G)":
TCP assumes that the connection has a more or less constant RTT and assumes delays are losses due to congestion somewhere on the path from A to B.
This struck a special chord with me, because I had just recently read about TCP algorithms that had been designed to combat "buffer bloat": instead of scaling strictly based on packet loss, assume increases in latency are due to buffering on the path.  Then, back off to avoid both packet loss and longer latency, which is measured by RTT.

Since 3G attempts to implement reliable delivery itself, TCP-in-3G bears performance characteristics similar to TCP-in-TCP that is explained in Avery Penwarr's sshuttle README.  (sshuttle takes care to extract data from the one TCP connection and copy it to a technically distinct connection, instead of wrapping it, in order to avoid the problem.)  And actually, I see that Singleton linked to another source going into more detail, which I skipped reading the first time around.

So not only is 3G a bad transport for that reason, but the variable RTT its delivery mechanism introduces also sinks TCP algorithms which try to use increased RTT to avoid queueing in buffers.  The buffer-avoidance aspect can't distinguish between "bad" buffers like those in a cheap home router that take huge chunks of data off the Ethernet at 100 Mbps, then dribble it out at 0.6 Mbps to the Internet at large; and "good" buffers like those in the 3G system that are unclogging the spectrum rather than crowding other users of the tubes.

Singleton proposes some mitigations for app developers; I'd rather try to "fix" TCP so that it gracefully handles variable RTT.  It may violate the perfect conceptual segregation of the OSI Seven Layer Model, but simply having the phone's TCP stack aware of the wireless interface itself would go a long way toward mitigating the problem.  Perhaps if the 3G hardware could indicate "link restored" and "backlog cleared", TCP could skip using the RTT of packets received between those events in its congestion avoidance.

It seems like WiFi would need some mitigations as well.  It is particularly prone to periods of "solid" packet loss, occasionally even destroying the beacon signal and thus kicking everyone off, and periods of fairly reliable reception.  However, when you do get reception back, the data pours in without significant degradation in speed, so the underlying issue is a bit different.  However, the connection always seems to be particularly slow if it has the bad luck of being started during a period of loss.

In the end, the problems seem to come from allowing endpoints to specify receive-windows, but not the network.  TCP views the network as a dumb thing that it can draw conclusions about based on end-to-end behavior.  Yet the increasing prevalence of wireless, and of sending TCP over wireless links, seems to indicate that "the network" should be able to add metadata to the packets (probably at the IP level, since the network is conceptually unable to peek inside of IP data) to indicate that the delivery of the packet was delayed for reliability.  Unfortunately, rogue devices could set that bit for their buffer-bloated packets, so it's about as practical as the Evil Bit.

No comments: