Friday, December 12, 2014

DynamoDB's Greatest Missing Feature

I wrote a while ago about DynamoDB in the trenches, but that was in 2012 and now 2015 is nigh.

The real pain point of DynamoDB is that everything is a hash key.  It's not sorted, at all.  Ever.

A local secondary index isn't the answer; it's a generalization of range keys, and as such, is still subordinate to the hash key of the table.  An LSI does not cover items with different hash keys.

A global secondary index isn't helpful in this regard, either.  It is essentially building and maintaining a second table for you, where its hash key is the attribute the index is on, and then it stores the key of the target element (and optionally, any other attributes from that element) from the 'parent' table.  But as a hash key, it doesn't support ordering...  The only thing that can be done with the index is a Scan operation.

DynamoDB still offers nothing else.

Some ideas that would seem, on their surface, to be perfect for DynamoDB actually don't work out well, because there's no efficient query for ordered metadata.  Most frequently, this bites me for data that expires, like sessions.  Sessions!  The use case that's so blindingly obvious, the PHP SDK includes a DynamoDB session handler.

Update (2023-01-28): DynamoDB has support for an 'expiration time' attribute these days.  We are using it for sessions now.  Due to encoding issues, our sessions are stored as binary instead of string type, but otherwise, it's compatible with the PHP SDK's data format.  Our programming languages actually use memcached but the data is stored in DynamoDB. End of update.

We store SES message data in there to correlate bounces with the responsible party.  We have a nightly job that expires the old junk, and just live with it sucking up a ton of read capacity when it has to do that.  On the bright side, unlike sessions, the website won't grind to a halt (or log people out erroneously) if the table gets wedged.

I'd like to put some rarely-changing configuration data in there (it's in SimpleDB and cached in memcache because SimpleDB had erratic query performance), but I wouldn't be able to efficiently search over it when I wanted to look up an entry for editing in the admin interface.  And really, I still want to put that in LDAP or something.

Honestly, if all I get with DynamoDB is a blob of stuff and a hash key, why not just store it as a file in S3?  How is DynamoDB even a database if it doesn't actually do anything with the data, and doesn't let you have any metadata?  Am I the crazy one, or is that the NoSQL crowd?

Related: Tim Gross on Falling In And Out Of Love with DynamoDB.

No comments: