Saturday, January 28, 2023

Experiences with AWS

Our core infrastructure is still EC2, RDS, and S3, but we interact with a much larger number of AWS services than we used to.  Following are quick reviews and ratings of them.

CodeDeploy has been mainly a source of irritation.  It works wonderfully to do all the steps involved in a blue/green deployment, but it is never ready for the next Ubuntu LTS after it launches.  As I write, AWS said they planned to get the necessary update out in May, June, September, and September 2022; it is now January 2023 and Ubuntu 22.04 support has not officially been released. Ahem. 0/10 am thinking about writing a Go daemon to manage these deployments instead.  I am more bitter than a Switch game card.

CodeBuild has ‘environments’ thrown over the wall periodically.  We added our scripts to install PHP from Ondřej Surý’s PPA instead of having the environment do it, allowing us to test PHP 8.1 separately from the Ubuntu 22.04 update.  (Both went fine, but it would be easier to find the root cause with the updates separated, if anything had failed.) “Build our own container to route around the damage” is on the list of stuff to do eventually.  Once, the CodeBuild environment had included a buggy version of git that segfaulted unless a config option was set, but AWS did fix that after a while.  9/10 solid service that runs well, complaints are minor.

CodeCommit definitely had some growing pains.  It’s not as bad now, but it remains obviously slower than GitHub.  After a long pause with 0 objects counted, all objects finish counting at once, and then things proceed pretty well.  The other thing of note is that it only accepts RSA keys for SSH access.  6/10 not bad but has clearly needed improvement for a long time.  We are still using it for all of our code, so it’s not terrible.

CodePipeline is great for what it does, but it has limited built-in integrations.  It can use AWS Code services… or Lambda or SNS.  8/10 conceptually sound and easy to use as intended, although I would rather implement my own webhook on an EC2 instance for custom steps.

Lambda has been quarantined to “only used for stuff that has no alternative,” like running code in response to CodeCommit pushes.  It appears that we are charged for the wall time to execute, which is okay, but means that we are literally paying for the latency of every AWS or webhook request that Lambda code needs to make.  3/10 all “serverless” stuff like Lambda and Fargate are significantly more expensive than being server’d.  Would rather implement my own webhook on an EC2 instance.

SNS [Simple Notification Service] once had a habit of dropping subscriptions, so our ALB health checks (formerly ELB health checks) embed a subscription-monitor component that automatically resubscribes if the instance is dropped.  One time, I had a topic deliver to email before the actual app was finished, and the partner ran a load test without warning.  I ended up with 10,000 emails the next day, 0 drops and 0 duplicates.  9/10 has not caused any trouble in a long time, with plenty of usage.

SQS [Simple Queue Service] has been 100% perfectly reliable and intuitive. 10/10 exactly how an AWS service should run.

Secrets Manager has a lot of caching in front of it these days, because it seems to be subject to global limits.  We have observed throttling at rates that are 1% or possibly even less of our account’s stated quota.  The caching also helps with latency, because they are either overloaded (see previous) or doing Serious Crypto that takes time to run (in the vein of bcrypt or argon2i).  8/10 we have made it work, but we might actually want AWS KMS instead.

API Gateway has ended up as a fancy proxy service.  Our older APIs still have an ‘API Definition’ loaded in, complete with stub paths to return 404 instead of the default 403 (which had confused partners quite a bit.) Newer ones are all simple proxies.  We don’t gzip-encode responses to API Gateway because it failed badly in the past. 7/10 not entirely sure what value this provides to us at this point.  We didn’t end up integrating IAM Authentication or anything.

ACM [AWS Certificate Manager] provides all of our certificates in production.  The whole point of the service is to hide private keys, so the development systems (not behind the load balancer) use Let’s Encrypt certificates instead.  10/10 works perfectly and adds security (vs. having a certificate on-instance.)

Route53 Domains is somewhat expensive, as far as registering domains goes, but the API accessibility and integration with plain Route53 are nice.  It is one of the top-3 services on our AWS bill because we have a “vanity domain per client” architecture.  9/10 wish there was a bulk discount.

DynamoDB is perfect for workloads that suit non-queryable data, which is to say, we use it for sessions, and not much else.  It has become usable in far more scenarios with the additions of TTL (expiration times) and secondary indexes, but still remains niche in our architecture.  9/10 fills a clear need, just doesn’t match very closely to our needs.

CloudSearch has been quietly powering “search by name” for months now, without complaints from users.  10/10 this is just what the doctor ordered, plain search with no extra complexity like “you will use this to parse logs, so here are extra tools to manage!”

That’s it for today.  Tune in next time!

No comments: