Tuesday, November 27, 2012

DynamoDB in the Trenches

Amazon's DynamoDB, as they're happy to tell you, is an SSD-backed NoSQL storage service with provisioned throughput.  Data consists of three basic types (number, string, and binary) in either scalar or set forms.  (A set contains any number of unique values of its parent type, in no particular order.)  All lookups are done by hash keys, optionally with a range as sub-key; the former effectively defines a root type, and the latter is something like a 1:Many relation.  The hash key is the foreign key to the parent object, and the range key defines an ordering of the collection.

But you knew all that; there are two additional points I want to add to the documentation.

1. Update is Upsert

DynamoDB's update operation actually behaves as upsert—if you update a nonexistent item, the attributes you updated will be created as the only attributes on the item.  If this would result in an invalid item as a whole, then you want to use the expected-value mechanism to make sure the item is really there before the update applies.

2. No Attribute Indexing or Querying

NoSQL is at once the greatest draw to non-relational storage, and also its biggest drawback.  On DynamoDB, there's no query language.  You can get items by key cheaply, or scan the whole table expensively, and there's nothing in between.  There's no API for indexing any attributes, so there's no API for querying by index, either.  You can't even query on the range of a range key independently of the hash (e.g. "find all the posts today, regardless of topic" on a table keyed by a topic-hash and date-range.)

If you need lookup by attribute-equals more than you need consistent query performance, then you can use SimpleDB.  RDS could be a decent option, especially if you want ordered lookups (as in DELETE FROM sessions WHERE expires < NOW();—when the primary key is "id".)

A not-so-good option would be to add another DynamoDB table keyed by attribute-values and containing sets of your main table's hash keys—but you can't update multiple DynamoDB tables transactionally, so this is more prone to corruption than other methods.

And if you want to pull together two sources of asynchronous events, and act on the result once both events have occurred, then the Simple Workflow service might work.  (I realized this about 98% of the way through a different solution, so I stuck with that.  I might have been able to store the half-complete event into the workflow state instead, no DynamoDB needed, but since I didn't walk that path, I can't vouch for it.)

Thursday, November 22, 2012

Perl non-CGI: The Missing Overview

After living in mod_php and Perl CGI for far too long, it was time to look at reworking our application to fit something else.  Although we had mod_perl installed and doing little more than wasting resources, I didn’t want to bind the app tightly to mod_perl in the way it was already tightly bound to CGI.  That meant surveying the landscape and trying to understand what modern Perl web development actually looks like.

But first, some history!

Wednesday, November 21, 2012

Because I Can: SMTPS + MIME::Lite monkeypatch

I had an app that calls to MIME::Lite->send() to send an email, which until recently was using an SMTP server slated for decommissioning Real Soon Now.  It was my job to convert it to Amazon SES, and I figured it would be easier to tell MIME::Lite to use SES's SMTP interface instead of importing the web side's full Perl library tree just for one module out of it.

Ha ha!  SES requires SSL, and neither MIME::Lite nor Net::SMTP have any idea about that.  They were both written before the days of dependency injection, so I had to go to some length to achieve it.  And now, I golfed it a bit for you:
package MyApp::Monkey::SMTPS;
use warnings;
use strict;
use parent 'IO::Socket::SSL';

# Substitute us for the vanilla INET socket
require Net::SMTP;
@Net::SMTP::ISA = map {
  s/IO::Socket::INET/MyApp::Monkey::SMTPS/; $_
}  @Net::SMTP::ISA;

our %INET_OPTS = qw(
  PeerPort smtps(465)
  SSL_version TLSv1
); # and more options, probably

# Override new() to provide SSL etc. parameters
sub new {
  my ($cls, %opts) = @_;
  $opts{$_} = $INET_OPTS{$_} foreach keys %INET_OPTS;
  $cls->SUPER::new(%opts);
}
PeerPort overrides the default of smtp(25) built in to Net::SMTP; I needed a port where the whole connection is wrapped in SSL instead of using STARTTLS, and 465 is the one suitable choice of the three that SES-SMTP supports.

The main caveat about this is that it breaks Net::SMTP for anyone else in-process who wants to send mail to a server that lacks a functional port 465.  But as you may have guessed, that's not a problem for my script, today.

Thursday, November 15, 2012

Some vim hacks

1. BlurSave

" Add ability to save named files when vim loses focus.
if exists("g:loaded_plugin_blursave")
 finish
endif
let g:loaded_plugin_blursave = 1
let s:active = 0

function BlurSaveAutocmdHook()
 if s:active
  silent! wa
 endif
endfunction

autocmd FocusLost * call BlurSaveAutocmdHook()
command BlurSaveOn let s:active = 1
command BlurSaveOff let s:active = 0
Save to ~/.vim/plugin/blursave.vim (or vimfiles\plugin\blursave.vim for Windows) and you now have a :BlurSaveOn command: every time your gvim (or Windows console vim) loses focus, named buffers will be saved.

My plan here is to develop a Mojolicious app in Windows gvim, with the files in a folder shared with a VirtualBox VM.  With blursave, when I Alt+Tab to the browser, vim saves and morbo reloads.

2. Graceful Fallback

The vim function exists() can test just about anything. I now have this stanza in my ~/.vim/syntax/after/mkd.vim:
" Engage UniCycle plugin, if loaded
if exists(":UniCycleOn")
    UniCycleOn
endif
Now, whenever I'm writing a new blog post for propaganda, I don't have to remember to run :UniCycleOn manually.

3. Extension remapping

Due to the disagreement on various systems as to what markdown should be called (Nocs for iOS offers just about every option except .mkd, while that happens to be the preferred extension for the syntax file I have for it—actually named mkd.vim), I also link .md to the mkd syntax in ~/.vimrc:
" .md => markdown
autocmd BufRead,BufNewFile *.md  setlocal filetype=mkd 
This lets me make Nocs-friendly Markdown files and still have vim highlight them.

Tuesday, November 13, 2012

Autoflush vs. Scope (CGI::Session)

CGI::Session writes its session data to disk when DESTROY is called, possibly during global destruction, but the order of global destruction is non-deterministic.  It will generally work when CGI::Session is writing to files, since it doesn't depend on anything else to do that, but using other storage like memcached or a database, the connection to storage may have been cleaned up before CGI::Session can use it.  Then your session data is mysteriously lost, because it was never saved to begin with.

Another possible interaction between object lifetime occurs when there are multiple CGI::Session objects: unless both have identical data, whichever one gets destroyed last, wins.  At one point, I added an END {} block to a file which had my $session declared.  All of a sudden, that END block kept $session alive until global destruction and the other CGI::Session instance, into which I recorded that a user was in fact logged in, now flushed first.  Because the logged-in state was then overwritten by the session visible from the END block (even though the block itself never used it), nobody could log in!

Yet another problem happened when I pulled $session out of that code and stored it in a package.  The END block had finished its purpose and been deleted, so moving $session to a package once again extended its life to global destruction: a package variable stays around forever, because the package itself is a global resource.  However, since the login path had flush() calls carefully placed on it, what broke this time was logout.  The delete() call couldn't take effect because the storage was gone by the time the session was cleaned up.

Friday, November 9, 2012

CGI::Session and your HTTP headers

For CGI::Session to work, you must send the Set-Cookie header (via $session->header() or otherwise) when the session's is_new method returns true.  I discovered this by tripping over an awesome new failure mode today:
  1. Restart memcached (or otherwise create new session storage).
  2. Nothing stays saved in the session.  Can't log in.
When CGI::Session receives a session ID that doesn't exist in session storage, it changes the session ID to prevent session fixation attacks.  Which means that if you only send the header in the absence of a browser cookie, data is written to the new ID, but the browser will re-submit the old ID next request.

(It turns out my real problem was the stupidly simple error of 'trying to write to the wrong memcached server,' but the above did happen to my test page while I was trying to figure out why memcached wasn't saving anything.)

Tuesday, November 6, 2012

It's All Programming

Programming codifies a process into something that can be executed on a machine.  But this is psychologically no different than codifying any other process into a set of rules for any interpreter, not necessarily mechanical.

The link between programming code and law has been noted in the past: the laws try to leave no room for argument, so they become long, and subject to similar problems as computer code.  Particularly unintended consequences: witness the spate of sexting prosecutions that try to brand teens as sex offenders for a decade for sending nude—or sometimes even just swimsuit—pics to their significant other.

Laws writ small are the ordinary rules of everyday life.  Those Dilbert moments where you receive multiple conflicting rules?  Those are bugs.

Friday, November 2, 2012

The Pointlessness of sudo's Default Run-As User

Amazon Linux ships with the default configuration*:
ec2-user ALL = NOPASSWD: ALL
Which means, ec2-user is allowed to run any command, without providing a password, while logged in from any machine.  But only as root—since the Runas_Spec is missing, the default of (root) is assumed.

This is entirely pointless because it also ships with the common PAM configuration, in which /etc/pam.d/su contains:
auth sufficient pam_rootok.so
So the game of Simon Says, in order to bypass the root-only sudo restriction so you can run as any user, password-free, without touching files in /etc in advance, becomes:
sudo su -s /bin/bash $TARGET_USER <pwn.sh
Normally, su uses the shell for the user as listed in /etc/passwd, but if we're interested in a /sbin/nologin account, then we can set any other shell listed in /etc/shells with the -s flag.

When you give any account root access, they probably have the whole machine.  I'm not sure what sudo was hoping to accomplish by "limiting" the default Runas_Spec to root.

* It also ships with Defaults requiretty which means you actually need someone to allocate you a controlling terminal for sudo to work, even though ec2-user doesn't need a password, and visiblepw is disabled by default.

Thursday, November 1, 2012

Bugs in Production

The amount that a bug hitting production annoys me turns out to be proportional to log(affected_users / time) * stupidity_of(bug).  If nobody can use the core functionality of the app because of something that would have failed a perl -c check, that yields a lot more angst than "some non-critical task doesn't work for one (uniquely configured) client when the day of the month is 29 or more," even though the latter is often more difficult to diagnose.

Yeah.  I crashed our site the other day over a trivial logging change, intended to gather debugging information for a rare condition of the latter sort.  It was so trivial it couldn't possibly go wrong, meaning stupidity_of(bug) was quite large.