Sunday, June 2, 2024

Stateful Deployment was Orthogonal

I used to talk about “stateful, binary” deployment, thinking that both things would happen together:

  1. We would deploy from a built tarball, without any git pull or composer install steps
  2. We would record the actual version (or whole tarball path) that was deployed

This year, we finally accumulated enough failures caused by auto-deploy picking up pushed code that wasn’t ready that we decided we had to solve that issue. It turned out to be unimportant that we weren’t deploying from tarballs.

We introduced a new flag for “auto mode” for the instance-launch scripts to use. Without the flag, deployment happens in manual mode: it performs the requested operation (almost) as it always has, then writes the resulting branch, commit, and (if applicable) tarball overlay as the deployed state.

In contrast, auto mode simply reads the deployed state, and applies that exact branch, commit, and overlay as requested.

I say “simply,” but watch out for what happens to a repository which doesn’t have any state stored.  This isn’t a one-time thing: when adding new repositories later, their first deployment won’t have state yet, either.  This can disrupt both auto and manual deployments.

Storage engine

We decided on SSM Parameter Store as the storage backend, since we anticipated a “read-mostly” usage.  We will have a handful of manual updates, and in between, every single instance launch and AMI build process will read from the store.  I didn’t want to worry about correctly sizing a DynamoDb table, and I definitely wanted to isolate it from transactional data.

At some point, we may get ambitious, and build a dashboard to show what commits (and their dates) are running in production, which would add more read traffic.

Handling branch and commit

Manual deployment offers two flags (branch and commit) that can influence what gets deployed.  Without them, the tip of the last-deployed branch is used, which is normally the main branch.  If given a branch but no commit, the tip of that specific branch is used.  Otherwise, if the specific commit is reachable from the current/given branch tip, it is checked out.  A commit that is not on the branch is an error.

It’s “almost” how it has always worked, because we have the error checking now, and because the branch is loaded from storage instead of “whatever was last moved to in the reflog.”

Tarball overlays

We may not build a package for the entire site, but we do build a package of our JavaScript code.  React components get bundled up, put into an assets directory, and then that gets packed into a tarball and uploaded to S3. Deployment of such sites then pulls the tarball from S3 and unpacks it over the document root to add those pre-built files.

Logically, it’s a small jump from overlays to full binary deployment (“unpack a tarball and go”), but in practice, we haven’t made it.  The whole deployment system has been built around git for a decade.

No push/pull separation

For historical reasons, the deployer does not have a specific control node. “Writing to the deployed state” is attempted from every instance that deployed, and SSM Parameter Store performs another full transaction even if there’s no change to the data being stored.  Therefore, the deployer does a random wait after it finishes updating.  Then, it reads SSM again, and writes the new state back only if SSM still contained the original state.

(I put this verify-then-write cycle on as a finishing touch, which promptly broke state updates when I forgot to handle ParameterNotFound exceptions from the read.  The outer loop would catch it, consider it a failed deployment, and prevent rotating the result into service.  It became a very intensive no-op.)

I could add a separate control node, of course.  In that case, “manual” deployment would mean we determined what change to make to the state, updated SSM Parameter Store with it from the control node, and then pushed out an “auto-deploy now” message for the instances to consume.  However, we don’t have the scale to justify the effort at this point.

No comments: