Friday, May 18, 2018

Why we have a memcached-dynamo proxy

Previously, I mentioned that we have a proxy that speaks the memcached protocol to clients and stores data in dynamodb, “for reasons.”  Today, it’s time to talk about the architecture, and those reasons.



One major constraint is that we still have Perl CGI running, and when we built the proxy, it took too long to load DynamoDB libraries into memory to begin each request.  It was seriously a full second, and without that, our response times on that hardware were 100 to 250 ms.

The memcached client libraries loaded a lot faster.  This not only allowed Perl to be responsive, but enabled the PHP sister site to configure memcached-based sessions, completely transparently to user code.

For a while, we used ElastiCache, but the smallest single-node cluster size ended up too large for our scale. We stored sessions and they never had to be evicted, after months and months.  We paid a lot for too much memory, and we still couldn’t go straight to DynamoDB because of the CGI Problem.

To shut down ElastiCache, I had the idea of building the proxy.  I found Memcached::Server and Amazon::DynamoDB, and I plugged them into each other.  Both use different async frameworks, but all of those can use all the others, so we ended up with IO::Async underneath, and AnyEvent attached to it.

Through brute force, I eventually got it working well enough.  We ran into problems along the way: IO::Poll was broken with SSL connections, and this is what IO::Async uses by default.  It would fail by spinning the CPU, so I added a loadavg monitor that would kill the proxy daemon and let the process supervisor restart it if the loadavg went up.  Then I replaced Poll with EV and things were okay, other than occasional errors being logged about things being undefined.  Today, Epoll is working smoothly, and the bug in Poll is still open (so I doubt it’s fixed.)

We don’t have CGI::Session do session locking; it only gets written from one endpoint.  But we do have PHP do locking, so the proxy also takes care of recognizing PHP’s lock requests and updating a lock attribute on the underlying session.  Also, when PHP 7 switched to binary memcached protocol and changed the session lock key, I configured it back to the classic mode.

The proxy uses the exact same DynamoDB table format that the AWS SDK for PHP would use, to ease compatibility.  The long-range goal is to eliminate all Perl, which would technically enable shutting down the proxy.  However, I can’t tell yet if we will actually “make PHP use the AWS SDK’s DynamoDB session system,” or if we’ll leave a memcache-dynamo proxy in the architecture forever, because of its transparency.

(The CGI Problem is less than it used to be in absolute terms, because we’re using more powerful hardware that can load the DynamoDB modules in 320 ms. However, it would still be a noticeable regression, as regular request latency has also dropped.  I’m unwilling to add even 50ms per request now.)

No comments: