Monday, July 30, 2012

DynamoDB Performance

Things I learned this past week:
  1. AWS Signature V4 no longer requires temporary credentials.  If you aren’t caching your tokens, this can give you a nice speedup because it cuts IAM/STS out of the connection sequence.
  2. AWS service endpoints are SSL.  If you make a lot of fresh connections, you may pay a lot of overhead per connection.
  3. Net::Amazon::DynamoDB and CGI are terrible things to mix.
Read on for details.

AWS Signature V4

I happened to learn about the IAM STS while researching why our DynamoDB performance was so miserable.  Due to the environment, which is a whole other point, each DynamoDB “request” was actually forming two requests: one to generate temporary security credentials behind the scenes, then one to perform the actual DynamoDB request.

The PHP SDK, and the Perl library I was using, both hide the generation and caching of credentials from their callers, leaving me blissfully unaware of this dance in the background—or that the temporary credentials were being silently discarded at the end of each request/response cycle, although they’re generated by default with a 12-hour lifespan.

As noted in the above link, what is currently the latest signature method supported by AWS removes the need for temporary credentials, allowing requests to be signed directly with the account’s main credentials.  Ultimately, however, this all turned out to be a red herring for me.  All I managed to do by caching the temporary credential by hand (sneaking it out of the class, saving as JSON, and slipping it back in on a subsequent request) was shave off a little bit of time in exchange for failing to meet the “non-sticky session” goal.

SSL Connections

Another avenue of investigation was SSL: as opening a connection without the session ID cache incurs at least 2 round trips (ClientHello, ServerHello, ClientKeyExchange, ChangeCipherSpec); a test involving a local machine and Wireshark indicated around 100ms latency for the SSL handshake alone.  However, there didn’t seem to be any good options for mitigating the problem in either language, as none of the backends supported an obvious or well-documented way to set some sort of session cache area.

This points out an interesting ambiguity in the use of “session” in the thread regarding Is DynamoDB slower than RDS for reads?  The term could be referring to DynamoDB caching credential lookups, or basing a lookup off the SSL session information.  Either of those could be reasonably justified as “the session.”  DynamoDB doesn’t return a cookie, so that’s not it, either.

Ultimately, this turned out to be another red herring for me, but if you’re interested in seeing DynamoDB’s alleged latency advantages, you will need to build your app so that it can make use of the SSL session cache.

Net::Amazon::DynamoDB and CGI

For reasons that seemed good at the time and haven’t been repaired due to a lack of priority, the Perl side of the app is structured as ye olde cgi-handler code.  It turns out that far beyond the impact of actual DynamoDB requests (that I was measuring, and was able to reduce by half) is the part I wasn’t measuring: the act of loading Net::Amazon::DynamoDB to begin with.

I took a different approach to benchmarking and measured, from a remote client’s point of view, the wall time required to service a request to read and/or write some data into/from DynamoDB.  Now all of a sudden, the mean service time was not 400 ms as logged from my earlier measurements, but 1600 ms!  It takes a lot of CPU power to load up Net::Amazon::DynamoDB because it depends on Moose, which in turn is fairly heavyweight.

In fact, the recipe that builds our AWS server takes approximately 9 minutes to build the Perl dependencies which aren’t included in the distribution repository; adding Net::Amazon::DynamoDB takes another 16 minutes, almost twice as long as everything else combined.  (Don’t try this on a micro instance.)

So the people hitting the DynamoDB backed instance were finding the site very slow because every one of their requests were getting delayed 1.2 additional seconds.

Toward a Solution

For now, we’re switching over to ElastiCache for session data and dropping the DynamoDB table.  Traffic would need to grow by a factor of 3,000 before a week’s worth of session data couldn’t be stored in the smallest possible ElastiCache configuration.

Some last-second tests before I left work today suggested I can load Cache::Memcached::Fast in 33ms, which is better than Cache::Memcached (which is pure Perl and about twice as long to load) and far, far better than trying to mix DynamoDB and CGI.

No comments: