Sunday, April 12, 2026

Server Management: We Should Be Able to Recreate Pets, Too

For anyone not familiar with the analogy, there are two types of server or private instance: “cattle” exist en masse and have names like atl-prod3-24.  If one has problems, the first solution is replacement.  “Pets,” on the other hand, are unique instances that matter a lot, often with whimsical names.  Problems with pets are approached by working to cure what ails them.

Once upon a time, we had a single pet server which was our bastion host, cron job runner, and SFTP file-exchange point with third parties.  We split those roles apart; the cron jobs get access to the SFTP files (for input or output) by sharing an EFS mount.  The SFTP machine everyone uses can be firewalled off from the rest of our codebase and AWS resources.  (Not that I expected 0days, but there was a notable close call a couple of years after this split was made.)

After a couple of Debian upgrades, the SFTP host had some network issues.  Investigation found a post putting the blame on ‘the image builder’ and telling the person with the question (my same question, unfortunately) that because of that, they were on their own.

It seems likely, then, that we’ll want to rebuild this pet from scratch at some point.  I fixed everything this time, but who knows what might happen on the next upgrade?  It seems like it would be far more stable if we could re-customize a clean base image, instead of doing a series of in-place upgrades.  This is especially true if each Debian upgrade isn’t prepared for anything that is specific to the Debian AWS cloud images.

A long aside about the networking problems I faced follows.

There were two issues that came up, due to the replacement of isc-dhcp-client.  The package was abandoned upstream, and Debian revoked security support for it.  The timeline doesn’t seem to have allowed for this to be a well-planned event; it’s more like, apt-listchanges says “Users are encouraged to find a different option.”

In my case, I looked around, found ifupdown installed, and saw that dhcpcd-base was an option to integrate with that.  Therefore, I installed it.  A quick test of “restarting the networking service” then reported failures.  I narrowed that down to some ifupdown hook scripts that had names suggesting they were from cloud-init.  These scripts were trying to call the ISC /usr/sbin/dhclient.

But that’s where I found it’s “the image builder” who put them there.  dpkg -S concurred that no package owned them.  I backed them up, deleted them from ifupdown, and moved on to other tasks… and then the instance fell offline until I rebooted it.

It seems like what happened there was that dhcpcd-base didn’t take over the DHCP lease properly, so when the time came to renew it, nothing happened.  The system just quit using its address.  In hindsight, it’s possible that this is because I didn’t do a final, working restart of the networking service.

(I recently had my VPS go offline because the emulated physical layout changed, so I wasn’t interested in making a much bigger change away from ifupdown for our production system.)

No comments: