Tuesday, January 31, 2023

Argon2id Parameters

There are no specific recommendations in this post. You will need to choose parameters for your specific situation.  However, it is my hope that this post will add deeper understanding to the recommendations that others make.  With that said…

The primary parameter for Argon2 is memory. Increasing memory also increases processing time.

The time cost parameter is intended to make the running time longer when memory usage can’t be increased further.

The threads (or parallelism, or “lanes” when reading the RFC) parameter sub-divides the memory usage.  When the memory is specified as 64 MiB, that is the total amount used, whether threads are 1 or 32.  However, the synchronization overhead causes a sub-linear speedup, and this is more pronounced with smaller memory sizes.  SMT cores offer even less speed improvement than the same number of non-SMT cores, as expected.

I did some tests on my laptop, which has 4 P-cores and 8 E-cores (16 threads / 12 physical cores.) The 256 MiB tests could only push actual CPU usage to about 600% (compared to the 1260% we might expect); it took 1 GiB or more to reach 1000% CPU.  More threads than cores didn’t achieve anything.

Overall then, higher threads allow for using more memory, if enough CPU is available to support the thread count.  If memory and threads are both in limited supply, then time cost is the last resort for extending the operation time until it takes long enough.

Bonus discovery: in PHP, the argon2id memory is separate from the memory limit.  memory_get_peak_usage() reported the same number at the beginning and end of my test script, even for the 1+ GiB tests.

Saturday, January 28, 2023

Experiences with AWS

Our core infrastructure is still EC2, RDS, and S3, but we interact with a much larger number of AWS services than we used to.  Following are quick reviews and ratings of them.

CodeDeploy has been mainly a source of irritation.  It works wonderfully to do all the steps involved in a blue/green deployment, but it is never ready for the next Ubuntu LTS after it launches.  As I write, AWS said they planned to get the necessary update out in May, June, September, and September 2022; it is now January 2023 and Ubuntu 22.04 support has not officially been released. Ahem. 0/10 am thinking about writing a Go daemon to manage these deployments instead.  I am more bitter than a Switch game card.

CodeBuild has ‘environments’ thrown over the wall periodically.  We added our scripts to install PHP from Ondřej Surý’s PPA instead of having the environment do it, allowing us to test PHP 8.1 separately from the Ubuntu 22.04 update.  (Both went fine, but it would be easier to find the root cause with the updates separated, if anything had failed.) “Build our own container to route around the damage” is on the list of stuff to do eventually.  Once, the CodeBuild environment had included a buggy version of git that segfaulted unless a config option was set, but AWS did fix that after a while.  9/10 solid service that runs well, complaints are minor.

CodeCommit definitely had some growing pains.  It’s not as bad now, but it remains obviously slower than GitHub.  After a long pause with 0 objects counted, all objects finish counting at once, and then things proceed pretty well.  The other thing of note is that it only accepts RSA keys for SSH access.  6/10 not bad but has clearly needed improvement for a long time.  We are still using it for all of our code, so it’s not terrible.

CodePipeline is great for what it does, but it has limited built-in integrations.  It can use AWS Code services… or Lambda or SNS.  8/10 conceptually sound and easy to use as intended, although I would rather implement my own webhook on an EC2 instance for custom steps.

Lambda has been quarantined to “only used for stuff that has no alternative,” like running code in response to CodeCommit pushes.  It appears that we are charged for the wall time to execute, which is okay, but means that we are literally paying for the latency of every AWS or webhook request that Lambda code needs to make.  3/10 all “serverless” stuff like Lambda and Fargate are significantly more expensive than being server’d.  Would rather implement my own webhook on an EC2 instance.

SNS [Simple Notification Service] once had a habit of dropping subscriptions, so our ALB health checks (formerly ELB health checks) embed a subscription-monitor component that automatically resubscribes if the instance is dropped.  One time, I had a topic deliver to email before the actual app was finished, and the partner ran a load test without warning.  I ended up with 10,000 emails the next day, 0 drops and 0 duplicates.  9/10 has not caused any trouble in a long time, with plenty of usage.

SQS [Simple Queue Service] has been 100% perfectly reliable and intuitive. 10/10 exactly how an AWS service should run.

Secrets Manager has a lot of caching in front of it these days, because it seems to be subject to global limits.  We have observed throttling at rates that are 1% or possibly even less of our account’s stated quota.  The caching also helps with latency, because they are either overloaded (see previous) or doing Serious Crypto that takes time to run (in the vein of bcrypt or argon2i).  8/10 we have made it work, but we might actually want AWS KMS instead.

API Gateway has ended up as a fancy proxy service.  Our older APIs still have an ‘API Definition’ loaded in, complete with stub paths to return 404 instead of the default 403 (which had confused partners quite a bit.) Newer ones are all simple proxies.  We don’t gzip-encode responses to API Gateway because it failed badly in the past. 7/10 not entirely sure what value this provides to us at this point.  We didn’t end up integrating IAM Authentication or anything.

ACM [AWS Certificate Manager] provides all of our certificates in production.  The whole point of the service is to hide private keys, so the development systems (not behind the load balancer) use Let’s Encrypt certificates instead.  10/10 works perfectly and adds security (vs. having a certificate on-instance.)

Route53 Domains is somewhat expensive, as far as registering domains goes, but the API accessibility and integration with plain Route53 are nice.  It is one of the top-3 services on our AWS bill because we have a “vanity domain per client” architecture.  9/10 wish there was a bulk discount.

DynamoDB is perfect for workloads that suit non-queryable data, which is to say, we use it for sessions, and not much else.  It has become usable in far more scenarios with the additions of TTL (expiration times) and secondary indexes, but still remains niche in our architecture.  9/10 fills a clear need, just doesn’t match very closely to our needs.

CloudSearch has been quietly powering “search by name” for months now, without complaints from users.  10/10 this is just what the doctor ordered, plain search with no extra complexity like “you will use this to parse logs, so here are extra tools to manage!”

That’s it for today.  Tune in next time!

Thursday, January 26, 2023

FastCGI in Perl, but PHP Style [2012]

Editor's note: I found this in my drafts from 2012. By now, everything that can be reasonably converted to FastCGI has been, and a Perl-PHP bridge has been built to allow new code to be written for the site in PHP instead. However, the conclusion still seems relevant to designers working on frameworks, so without further ado, the original post follows...

The first conversions of CGI scripts to FastCGI have been launched into production. I have both the main login flow and six of the most popular pages converted, and nothing has run away with the CPU or memory in the first 50 hours. It’s been completely worry-free on the memory front, and I owe it to the PHP philosophy.

In PHP, users generally don’t have the option of persistence. Unless something has been carefully allocated in persistent storage in the PHP kernel (the C level code), everything gets cleaned up at the end of the request. Database connections are the famous example.

Perl is obviously different, since data can be trivially kept by using package level variables to stash data, but my handler-modules (e.g. Site::Entry::login) don’t use them. Such handler-modules define one well-known function, which returns an object instance that carries all the necessary state for the dispatch and optional post-dispatch phases. When this object is destroyed in the FastCGI request loop, so too are all its dependencies.

Furthermore, dispatching returns its response, WSGI style, so that if dispatch dies, the FastCGI loop can return a generic error for the browser. Dispatch isn’t allowed to write anything to the output stream directly, including headers, which guarantees a blank slate for the main loop’s error page. (I once wrote a one-pass renderer, then had to grapple with questions like “How do I know whether HTML has been sent?”, “How do I close half-sent HTML?”, and “What if it’s not HTML?” in the error handler.)

Sunday, January 22, 2023

PHP’s PDO, Single-Process Testing, and 'Too Many Connections'

Quite some time ago now, I ran into a problem with running a test suite: at some point, it would fail to connect to the database, due to too many connections in use.

Architecturally, each connection sent a PSR-7 Request through the HTTP layer, which caused the back-end code under test to connect to the database in order to fulfill the request.  All of these resources (statement handles and the database handle itself) should have been out of scope be the end of the request.

But every PDOStatement has a reference to its parent PDO object, and apparently each PDO keeps a reference to all of its PDOStatements.  There was no memory pressure (virtually all other allocations were being cleaned up between tests), so PHP wasn’t trying to collect cycles, and the PDO objects were keeping connections open the whole duration of the test suite.

Lowering the connection limit in the database engine (a local, anonymized copy of production data) caused the failure to occur much sooner in testing, proving that it was an environmental factor and not simply “unlucky test ordering” that caused the failure.

Using phpunit’s --process-isolation cured the problem entirely, but at the cost of a lot of time overhead.  This was also expected: with the PHP engine shut down entirely between tests, all of its resources (including open database connections) were cleaned up by the OS.

Fortunately, I already had a database connection helper for other reasons: loading credentials securely, setting up exceptions as the error mode, choosing the character set, and retrying on failure if AWS was in the middle of a failover event (“Host not found”, “connection refused”, etc.) It was a relatively small matter to detect “Too many connections” and, if it was the first such error, issue gc_collect_cycles() before trying again.

(Despite the “phpunit” name, there are functional and integration test suites for the project which are also built on phpunit.  Then, the actual tests to run are chosen using phpunit --testsuite functional, or left at the default for the unit test suite.)

Wednesday, December 28, 2022

Linux Behavior Without Swap

We had a runaway script clog all of the memory on a micro EC2 Ubuntu instance. Not enough that the kernel OOM killer would do anything, and not enough that the script itself hit the PHP memory limit, but enough to make the instance become unresponsive for 45 minutes.

I have sent Linux into thrashing, back in the old days when typical desktop RAM sizes were less than 1 GB and SSDs weren’t available yet.  What surprised me was just how similar “running out of RAM” was in the modern times, even with the OOM killer.  It let the system bog down instead of killing a process!

We chose to mitigate the issue at work by expanding the instance, so that it has more RAM than memory_limit now.  It will take more than one simultaneous runaway script to bring it down in the future.  (We also fixed the script.  I don’t like throwing resources at problems, in general.)

Then one day, via pure serendipity, I found out about earlyoom.  I have added it to our pet instances, and I’m considering it for the cattle template, but it hasn’t been well-tested due to our previous mitigations.  The instance simply doesn’t run out of RAM anymore.

At home, I first set up swap on zram so that Ubuntu Studio would have a place to “swap out” 2+ GB (out of 12 GB installed), and then recently added a swap partition while I was restructuring things anyway.  It’s not great for realtime audio to swap; but “not having swap” doesn’t appear to change the consequences of memory pressure, so I put some swap in.  With a dedicated swap partition added, I reduced the zram area to 512 MB.  I still want to save the SSD if there’s a small-to-moderate amount of swap usage.

Tuesday, December 27, 2022

debootstrap’ing a Recovery Partition

One of the nicer things about trying Fedora on my work laptop is that when it broke the boot loader, there was a functioning Recovery Mode.

My desktop relies on a particular driver for WiFi, and upgrading the kernel (e.g. from Ubuntu 22.04 to 22.10) requires fully reinstalling it.  But what if the kernel upgrades to a version that isn’t supported by the copy of the driver I happened to have on disk?  And I didn’t want to “just” (disassemble and move the PC to) plug in an Ethernet cable?

I used the Fedora live environment to make a little bit of room for an 8 GiB partition at the end of the disk (and a 2 GiB swap partition, as long as I was there), and then I ran debootstrap to fill it in.  This is about what surprised me doing that.

tl;dr: debootstrap is a lot more aggressively minimalist, more like Arch, than I would have expected.

Saturday, December 17, 2022

Container Memory Usage

How efficient is it to run multiple containers with the same code, serving different data?  I am most familiar with a “shared virtual hosting” setup, with a few applications behind a single Web frontend and PHP-FPM pool.  How much would I lose, trying to run each app in its own container?

To come up with a first-order approximation of this, I made a pair of minimal static sites and used the nginx:mainline-alpine image (ID 1e415454686a) to serve them.  The overall question was whether the layer would be shared between multiple containers in the Linux memory system, or whether each image would end up with its own copy of everything.

Updated 2022-12-19: This post has been substantially rewritten and expanded, because Ubuntu would not cleanly reproduce the numbers, requiring deeper investigation.