Friday, September 26, 2014

An AWS VPN: Part I, IPSec

I recently set up an IPSec/L2TP VPN.  This post is about how I did it, and how I debugged each individual part.  In honor of how long it took me, this will be a three-part series, one for each day of work I spent on it.  Part II will cover L2TP/PPP, and part III will get into routing and DNS.

First things first: IP addresses have been changed.  Some details will apply to Linux, OpenSWAN, xl2tpd, pppd, iproute2, dnsmasq, or road-warrior configurations in particular, but the theory is applicable cross-platform.  I want to cover both, because the theory is invaluable for debugging the practice.

This also covers a few details about putting the VPN server on an AWS instance in EC2 Classic with an elastic IP.  But first, let's take a look at the general structure we have to work with:

A client connecting from a network we don't control, over networks that are possibly evil (hence the need for a VPN), to a server that provides access to vague "other resources" in non-globally-routable space behind that server.  We don't know where the client is going to be, and/or the corporate net is sufficiently large, that we can't use dirty NAT tricks to move each network "into" the other.  (But, spoiler, we will take some inspiration from there.)

Without further ado, here's part I: IPSec.  Part II and III are forthcoming.


Configuring IPSec

Our first step is to establish an IPSec connection in transport mode.  This layer provides the security for the L2TP layer and all other traffic being carried over the VPN.

Here's a brief view of the global section of ipsec.conf:
version 2.0

config setup
  dumpdir=/var/run/pluto/
  nat_traversal=yes
  virtual_private=...
  oe=off
  protostack=netkey
  #plutostderrlog=/var/log/pluto.log
  #plutodebug=control

I hear that iOS does not recognize the need for NAT traversal due to a bug, which means we'll need to use forceencaps in the actual connection, and that depends on the nat_traversal setting here.

virtual_private defines the private IP space that a client is allowed to connect from.  This is generally a list of 10.x.x.x, 192.168.y.y, and 172.{16-32}.z.z, and sometimes IPv6 addresses are included.  I brazenly allow it all, although most configurations on the Internet add in a rule like !%v4:10.1.2.0/24 to exclude the corporate LAN.  I think the only client addresses that will not work are ones that the server has a direct route for, and although ec2 seems to use a lot of the 10.0.0.0/8 space, the actual server is configured with a /26 netmask which doesn't collide with any of the client sites I can influence the numbering of.

oe is "opportunistic encryption" through the DNS; I'm much more interested in having encryption all of the time, not just some of it.  In any case, I'm not setting up anything in the DNS, so it's not going to work.

protostack specifies what backend to use to run the encryption system.  Since Amazon's kernel didn't provide KLIPS support, which appears to be older, I'm using the userspace NETKEY stack.  I pretty much just looked at the logs and saw there wasn't kernel support, and that made the decision for me.

The last two lines allow for getting unlimited output from pluto.  My server happens to be using rsyslog which rate-limits messages aggressively, and pluto can get extremely chatty, especially so with debug=all.  By setting up an alternate log file, I can dump all the messages without involving rsyslog, which might even be faster.

Now, to cover the actual connections, we continue in /etc/ipsec.conf:
conn vpnpsk-nat
  rightsubnet=vhost:%priv
  also=vpnpsk

conn vpnpsk
  left=%defaultroute
  leftid=ELASTIC_IP_HERE
  leftprotoport=17/1701
  right=%any
  rightprotoport=17/%any
  compress=no
  authby=secret
  auto=add
  pfs=no
  keyingtries=3
  rekey=no
  ikelifetime=8h
  keylife=1h
  type=transport
  dpddelay=30
  dpdtimeout=180
  dpdaction=clear
  forceencaps=yes

Wow, that's a lot of stuff.  We start by setting up the server as "left" and the client as "right"; the leftid is the public IP of the server that's used in the authentication exchange.  Because EC2 provides an elastic IP using NAT, the interface on the server does not know that it has this public IP.  Therefore, we provide it here.  It's needed to make IKE work, but after that, it MUST NOT be provided elsewhere!

The leftprotoport ends up specifying what traffic we allow over this tunnel; 17 is the protocol number for UDP and 1701 is the port for L2TP.  Meaning, the only traffic this IPSec tunnel will encrypt is traffic traveling to/from the L2TP server.  This also means that port 1701 MUST NOT be opened in a firewall, as traffic destined for it MUST travel via IPSec (which will be using other ports) in order to be secured.

As for compression, I like to let the highest layer compress data (it knows the most about what it's sending) and trust the others to shuttle bytes.  Doubly so, since IPSec's choice of compression algorithm is limited by having to deliver packets in a timely fashion.

authby specifies that this is a pre-shared key, auto tells how to set up the connection, and then there's a block of magic I pulled off the internet.  I haven't tested how much of that is irrelevant, but the documentation claims OpenSWAN will never refuse PFS if the client proposes it, even when pfs=no is set.

That brings me to type=transport, which sets this connection in transport mode.  That means, it's only usable between the two hosts that set it up, but in exchange, some overhead is reduced.  Specifically, the "original IP header" from IPSec's point of view does not need to be preserved, because arriving at this host (with a valid security association) is proof that the data is addressed to the host.  All it needs is the TCP/UDP protocol inside to finish delivery.

The DPD (dead peer detection) and forceencaps settings work around battery optimizations and potentially a bug, as noted above.  I do not know what iOS versions this applies to.

Debugging IPSec

The first thing to note is that IPSec comes in two parts: a security "association" and a security "policy."  When the internet says to check associations with ip xfrm state, they are showing you how to see the current associations.  When an IPSec negotiation succeeds, then an association for the hosts appear in this list; if IPSec itself fails, the list can be empty.

It's the security policy that determines which packets go into the IPSec channel to begin with.  (I'm trying to avoid the more natural word "tunnel" for this, to avoid confusion with tunnel mode, which is not relevant to this post.)  Because of leftprotoport, only packets coming to/from the L2TP server on the server's own idea of its IP address will be encrypted.  This is visible by viewing the policy output with ip xfrm policy.


Due to the use of the policy to determine packets going into the channel, neither endpoint's routing tables will change.  This is normal: the routes don't distinguish among protocol and port the way the policy needs to, and does.

If I understand it, IPSec processing happens in the Linux kernel before nat PREROUTING for inbound traffic, and after nat POSTROUTING for outbound traffic.  Because the security association to use depends on the machine the packets are traveling from/to, and it's much more useful to filter on decrypted packets.

Building the association happens in phases with plenty of round-trip traffic along the way; but as I understand it, the first phase happens on udp/500 for IKE/ISAKMP and this is more or less a pre-negotiation to determine how to set up the real connection.  For NAT-T, that becomes established over udp/4500.  Both phases have their own unique encryption proposals, and the client and server must both agree on one for the connection to succeed.  Both ports 500 and 4500 must be open on the server's firewall/security group for this to work.  I'm a bit fuzzy about where non-NAT-T traffic would go, but since I always need NAT-T (most of my clients, and always my server, are NAT'd), I do not have to worry about it in my setup.

My first specific problem with my VPN turned out to be that I was overzealous with the elastic IP; the established policy would encrypt traffic sent from that elastic IP, but since the server doesn't have it on the interface, the traffic never matched.  Therefore, L2TP responses went out in the clear and completely failed the NAT traversal.  The port it was trying to reach hadn't been sent from from the point of view of the client NAT, since it had gone out on port 4500, inside the IPSec channel, instead.

Once the IPSec channel is established each peer SHOULD have a policy routing traffic to/from its private IP to its peer's public IP.  The outgoing NAT will replace the private IP with the public one and the NAT at the other end will recognize the public IP and de-NAT it back to that end's private IP.  Meanwhile, using the private IP allows the IPSec association to be found and the packet encrypted before leaving the host.  When this is all working correctly, L2TP traffic can be properly exchanged in the IPSec channel--and as mentioned above, that means only the IPSec layer needs to have holes punched in the firewall for it.

Although I used tcpdump a bit (at both client and server ends) to debug the IPSec layer, it's not a very efficient approach.  It's much more profitable to look at the log files left by the client (such as OS X's detailed log setting), or pluto's log files, and search for errors and failures.  tcpdump can demonstrate a failure of IPSec (such as L2TP responses not going back to the client through the secure channel) but has no information about why.

I also briefly tried tunnel mode when transport mode "wasn't working," but that failed even faster because OS X won't establish an association in tunnel mode for an L2TP VPN.

To Be Continued

I do believe that's all for IPSec, stay tuned for part II: L2TP and pppd and part III: routing and DNS.

No comments: