Decoded Node: 02/2025

Wednesday, February 26, 2025

AWS Auto Scaling, Load Balancers, and Availability Zones

We ran into an odd situation: one of our EC2 instances was attached to a target group, but “unused” on the load balancer, which ended up tripping the “not enough healthy hosts” alarm.

This ended up being my fault: I had removed a subnet (thus, availability zone) from the load balancer, but did not realize it was still associated with the Auto Scaling group.

When increasing capacity, AWS picked one of the four subnets available on the group, as instructed. The instance was cheerfully launched into that subnet, but it happened to be the one that wasn’t associated with the load balancer. Conversely, the fifth zone the load balancer used to have was pointless, because Auto Scaling would never launch an instance there.

(This all came about as part of an effort to reduce our availability zone footprint and improve colocation of resources. There are probably diminishing returns, and three zones to a region should be enough.)

tl;dr: Auto Scaling has a subnet configuration that is independent of any load balancer’s configuration. For successful operation, the load balancer must have every subnet that is assigned to any Auto Scaling group using that load balancer.

Sunday, February 23, 2025

Podman Desktop Isn’t Great

Since buying a new computer, my primary desktop is no longer Linux, and containers are no longer native. I decided that the path of least resistance would be to try Podman Desktop, but it leaves a lot to be desired.

Read on ⇒

Sunday, February 16, 2025

My Experience with Switching from Psalm to PHPStan

Due to Psalm’s lack of support for being installed with PHP 8.4 or PHPUnit 11 at the time (January 15, 2025, prior to the Psalm 6.0 release), I finally gave PHPStan a try.

The big difference that has caused the most trouble is that PHPStan wants iterable/container types to explicitly document their contents. Any time a method returns array, PHPStan wants to know, array of what? Psalm was happy to observe what the method put in the array for return, and use that de facto type as the developer’s intention.

Outside of the smallest library repositories, that rule got ignored. It is responsible for maybe 75% of issue reports. If I can take it from 1200 down to 275 with a single ignore, that is the difference between “there are too many things to deal with” and “I can make a dent in this today.”

The next obvious difference has been that PHPStan is much more interested in handling the potential false returned from preg_replace('/\\W+/', '', $str); calls. Psalm expected that giving three string arguments to preg_replace() will always result in a string of some sort.

There’s also a class of issues reported by PHPStan due to a disagreement in the other direction. Psalm seemed to think that number_format() returned string|false, requiring an is_numeric() check on the variable. PHPStan thinks that is redundant, i.e. that number_format() has already returned a numeric-string.

I don’t have a sense yet for how effective PHPStan is at finding problems overall. In code that was previously checked with Psalm, many defects visible to static analysis have already been removed, leaving little fruit behind for PHPStan to pick.

As of early February, PHPStan can be considered a successful migration. I haven’t touched PHPStan Pro, but I may try it if I ever want to fix those hundreds of issues with array types.

Sunday, February 9, 2025

The Ruthless Elimination of Differences

I am excited for image-based Linux. Yes, I usually complain about people upending things just when they get stable, but I think there’s a real benefit here: the elimination of differences.

Why, exactly, does installing Ubuntu have to unpack a bunch of .deb files inside a system? Thousands or millions of machines will go consume CPU to run maintainer scripts, to hopefully produce identical output, when most of the desired result should have been possible to save as an image in the first place. Upstream should know what’s in ubuntu-minimal! Looking through a different lens, Gentoo distributes a stage2 image.

In theory, an installation CD could carry the minimal image, the installer overlay, and the flavor’s overlay. The installer’s boot loader would bring up the kernel, use the minimal+installer pair as root file system, and the installer would unpack the minimal+flavor images into the new disk partition.

“Image-based Linux” more or less takes this one more step, running the entire system directly from the images (or a singular combined image.) Everyone gets to use the same pre-made images, and bugs become less dependent on the history of package operations.

If any of this sounds like Puppy Linux, that’s not entirely accidental.

This is also the space where things like ABRoot are being introduced. Image-based Linux lends itself well to having an integrated rollback/recovery pathway. Even on my non-image systems, having “a recovery partition” has been more valuable than I ever anticipated. It let me test backups without having to work very hard about simulating a disaster. I also created my own recovery partition when I was still using a RealTek USB WiFi device, to avoid being stranded without internet. (Word to the wise: use Mediatek instead, or an Intel PCIe card is a good non-USB option.)

Image-based Linux and the tools around it are poised to make real improvements to the repeatability and reliability of the systems. I don’t know when I, personally, might benefit (as my daily driver is macOS now), but I am very excited about the progress being made here.

Sunday, February 2, 2025

We Finally Put Up a WAF

Someone sent an awful lot of requests at a system for long enough that management noticed the issue. Working with the responsible admin, I ended up proposing AWS WAF “to see what would happen.”

What happened: WAF blocked 10,000 requests per minute, and someone got the message. This released the pressure on the DynamoDB table behind the system, allowing it to jump straight from max to min capacity (1/16th) after fifteen minutes.

It seems some automated vulnerability scanner had gotten into an infinite loop. There were a lot of repeated URLs in the access logs, like it wasn’t clearing pages from its queue if they got an OK response but unexpected data. The reason “everything” returns OK is because an unknown URL (outside of a specific static-content prefix) returns a page with the React app root, and lets JavaScript worry about rendering whatever should be there.

I went ahead and put the same WAF on my systems, promptly breaking them. Meanwhile, our automated testing provider started reporting failures, with every request from the original system returning Forbidden.

The testing platform… is a bot. I had to write them an exception.

Turning my attention back to my systems, I put together a second WAF so I could have different policies. My system includes an API or two, so I needed to allow HTTP Libraries and non-browsers. I linked in the exception for the testing platform as well. Things went much more smoothly after that.

I know that the WAF is fundamentally “enumerating badness,” but it is clearly better than zero filtering. It is also much less effort and risk, which is why this sort of thing persists.