Wednesday, January 11, 2012

Layer 7 Routing: HTTP Ate the Internet

In the beginning was TCP/IP, and the predominant model was that servers would listen for clients using a pre-established port number.  Then came Sun RPC, in which RPC servers were established dynamically, and listened on semi-random ports (still, one port per service provided); the problem was solved by baking the port mapper into the protocol.  The mapper listens on a pre-established port, and the client first connects there to inquire, "On what port shall I find service X?"

Then came HTTP, the layer 6 protocol masquerading as layer 7.


Doing One Thing

Shortly after I started this post, I ran across The Lost Art of Telnet, which explains how the Internet used to be a bunch of stuff, listening on different ports, each running their own simple text protocol at layer 7.  gopher, telnet, MUDs and their friends, and IRC: all text based.

That was the way HTTP started, as well.  To this day, you can often make a successful request with nothing more than the request line and an appropriate Host header.

Layer Six

The funny thing is that in all my classes that ever touched on the OSI Model, there was never any example for anything at layer 6.  Of course, the world was IP by then, so we weren't staring at the implementation the model was designed for, but I think TLS (nee SSL) is the obvious candidate here.

And today, I'd extend HTTP down there as well, so that it would be spanning layers 6-7.

Wikipedia tells us that layer 6 included "capabilities such as converting an EBCDIC-coded text file to an ASCII-coded file...."  This would be irrelevant, except that HTTP explicitly permits servers to translate texts, which I learned while looking for the correct JavaScript MIME type.  In fact, Wikipedia lists MIME itself as a layer 6 protocol in the TCP/IP table.

HTTP's various encoding options, such as chunked and/or compressed encodings, also clearly pertain to layer 6.

Layer 7 is still clearly involved, though, since that's where entity bodies fall, as well as some of the metadata relating to them like Last-Modified.

I'm actually not sure what layer HTTP redirection would match up with.  If the application allows transparent redirection, then it would appear as part of level 6, but most libraries allow the application to be informed and specify further action when a redirect code is received.

Routing and REST

Something happened to HTTP over time.  The request-line itself provides almost no semantics, which means you can add your own.  Either through verbs, the WebDAV/DeltaV way, or by tunneling your application through some other verb, Siri ACE style.  (Or POST with _method=put for all you web browsers out there.)  Although the spec says some operations are idempotent (have the same result if they are repeated multiple times), you don't have to respect that.  You can make state-changing GET requests, for instance those /cgi-bin/counter.cgi images that were all the rage a decade ago.

I think I first noticed the extension of semantics to URLs in the days of Rails 1.1, when URLs were mapped into method calls, such as POST /bread/slice/3 which was handled by BreadController#slice.  What happened was that the URL space was fully virtual, not just some filesystem view, and it could have application-level semantics associated with it.

User Space Networking

HTTP had become, essentially, user-space networking; in contrast to TCP/IP and the data link layers, which were all in the kernel and had comparatively limited configuration available.  You just can't slap Perl into the TCP processing in the way you can put a PerlAuthenHandler into Apache with mod_perl.

The ability to perform any action on any URL and have this translated into some real effect was actually envisioned by the designers of HTTP itself.  Many of the response codes, such as 202 Accepted, were created for this "HTTP as an interface to other systems" vision.

Firewall Evasion

There's another factor involved in the rise of HTTP as a near-universal transport, which is the predominance of the old paradigm of stateless firewalls filtering by TCP port.  You could block ports 6667 and 119 and nobody would be able to waste time.  Well, that was the theory.  In those times, processing power and available memory didn't allow for much more advanced filtering than port numbers anyway.

In any case, the web turned out to be so important that it was usually unfiltered by these firewalls, so HTTP worked "everywhere" from the perspective of a client.  Thus, RTMP, SOAP, and probably many other protocols have defined standards for communicating over HTTP.

The obvious result of allowing only "secure" communication in old times has been to make "insecure" communications over "secure" channels.  Therefore, it's now required to run deep packet inspection tools to sort out whether a given HTTP message appears to be genuine HTTP or some tunneled protocol.

You could carry nearly any layer inside HTTP if you wanted.  As a thought experiment, I just designed SOCKS over HTTP, which is amusingly absurd, but not much worse than an L2TP VPN.  And if you wanted to cheat, there's always the Upgrade functionality.

Dumb Pipes are Still Dumb on a Smart Network

One last interesting bit of HTTP is the caching architecture.  If you obey the standards for idempotent verbs and provide caching headers, then intermediaries can perform their own processing on your requests.  The network is no longer a truly dumb pipe, shuttling data mindlessly between origin server and client.

However, the pipes between the nodes remain somewhat dumb.  All they're dealing with are packets.

OSI Goes Fractal

Wrapping up, HTTP bears characteristics of layers 5 (through cookies), 6, and 7, and virtually any other layer except 1 could, in theory, be tunneled over HTTP.  There are a lot of tunnels defined already for various levels, even IP-in-IP, and the obvious cases of VPNs.

Therefore HTTP is not "just" layer 7, and anything being carried at layer 7 may not be "just" application data: HTTP or SMTP may be acting as layer 6 for a SOAP message.  You could technically be tunneling your Ethernet (and the TCP/IP it's carrying) inside those SOAP messages.  There's really no end to the amount of insanity you can come up with by wrapping tunnels inside each other.

The disadvantage to such tunneling is that the ultimate application is affected by the performance characteristics of the tunnel, such as overhead.  Or in the case of TCP, tunneling it over a reliable transport interferes with its heuristics and yields unreliable performance.

1 comment:

Chrisjh said...

Reminds me: Some years back (before I became assimilated) I penned an article “I hate REST and you should too!”. Bleeding application data and process across protocol layers seemed just wrong. :)