Friday, September 29, 2023

AWS: Requesting gp3 Volumes in SSM Automation Documents

I updated our EC2 instance-building pipeline to use the gp3 volume type, which offers more IOPS at lower costs.

Our initial build runs as an SSM [Systems Manager] Automation Document.  The first-stage build instance is launched from an Ubuntu AMI (with gp2 storage), and produces a “core” image with our standard platform installed.  This includes things like monitoring tools, our language runtime, and so forth.  The core image is then used to build final AMIs that are customized to specific applications.  That is, the IVR system, Drupal, internal accounting platform, and antique monolith all have separate instances and AMIs underlying them.

Our specific SSM document uses the aws:runInstances action, and one of the optional inputs to it is BlockDeviceMappings.  Through some trial and error, I found that the value it requires is the same structure as the AWS CLI uses:

- DeviceName: "/dev/sda1"
  Ebs:
    VolumeType: gp3
    Encrypted: true
    DeleteOnTermination: true

Note 1: this is in YAML format, which requires spaces for indentation.  Be sure “Ebs” is indented two spaces, and the subsequent lines four spaces.  The structure above is a 1-element array, containing a dictionary with two keys, and the “Ebs” key is another dictionary (with 3 items.)

Note 2: the DeviceName I am using comes from the Ubuntu AMI that I am using to start the instance.  DeviceName may vary with different distributions.  Check the AMI you are using for its root device setting.

The last two lines (Encrypted and DeleteOnTermination) may be unnecessary, but I don’t like leaving things to chance.

Doing this in a launch template remains a mystery.  The best I have been able to do, when trying to use the launch template, Amazon warns me that it’s planning to ignore the entire volume as described in the template.  It appears as if it will replace the volume with the one from the AMI, rather than merging the configurations.

I know I have complained about Amazon in the past for not providing a “launch from template” operation in SSM, but in this case, it appears to have worked out in my favor.

Thursday, September 28, 2023

Using Cloudflare DNS over TLS system-wide on Ubuntu

My current Linux distributions (Ubuntu 23.04 and the Ubuntu-derived Pop!_OS 22.04) use NetworkManager for managing connections, and systemd-resolved for resolving DNS queries.  I’ve set up Cloudflare’s public DNS service with DoT (DNS over TLS) support twice… and I don’t really have a solid conclusion.  Is one “better?”  🤷🏻

Contents

  • Per-connection mode with NetworkManager only
  • Globally with systemd-resolved / NetworkManager
  • Useful background info

Wednesday, September 20, 2023

Update on earlyoom

Back in Linux Behavior Without Swap, I noted that the modern Linux kernel will still let the system thrash.  The OOM killer does not come out until it is extremely desperate, long after responsiveness is near zero.

It has been long enough since installing earlyoom on our clouds that I did another stupid thing.  earlyoom was able to terminate the script, and the instance stayed responsive.

I also mentioned “swap on zram” in that post.  It turns out, the ideal use case for zram is when there is no other swap device. When there’s a disk-based swap area (file or partition), one should activate zswap instead.  zswap acts as a front-end buffer to swap, storing the compressible pages, or letting others go to the swap device.

One other note, zswap is compiled into the default Ubuntu kernels, but zram is part of the rather large linux-modules-extra package set.  If there’s no other need for the extra modules, uninstalling them saves a good amount of disk space.

Saturday, September 9, 2023

Upgrading a debootstrap'ped Debian Installation

For context, last year, I created a Debian 11 recovery partition using debootstrap for recovering from issues with the 88x2bu driver I was using.

This year, I realized while reading that I have never used anything but Ubuntu’s do-release-upgrade tool to upgrade a non-rolling-release system.  My Debian 12 desktop felt far less polished than Ubuntu Studio, so I reinstalled the latter, and that means I once again don’t need the 88x2bu driver.

Therefore, if I trashed the Debian partition, it wouldn’t be a major loss.  It was time to experiment!

Doing the upgrade

The update process was straightforward, if more low-level than do-release-upgrade.  There are a couple of different procedures online that vary in their details, so I ended up winging it combining them:

  1. apt update
  2. apt upgrade
  3. apt full-upgrade
  4. apt autoremove --purge
  5. [edit my sources from bullseye to bookworm]
  6. apt clean
  7. apt update
  8. apt upgrade --without-new-pkgs
  9. apt full-upgrade
  10. reboot
  11. apt autoremove

DKMS built the 88x2bu driver for the new kernel, and userspace appeared to be fine.

Fixing the Network

The link came up with an IP, but the internet didn’t work: there was no DNS.  I didn’t have systemd-resolved, named, dnsmasq, nor nscd.  Now, to rescue the rescue partition, I rebooted into Ubuntu, chroot’ed to Debian, and installed systemd-resolved.

Fixing Half of the Races

One of the Debian boots left me confused.  Output to the console appeared to have stopped after some USB devices were initialized.  I thought it had crashed.  I unplugged a keyboard and plugged it in, generating more USB messages on screen, so I experimentally pressed Enter.  What I got was an (initramfs) prompt!  The previous one had been lost in the USB messages printed after it had appeared.

It seems that the kernel had done something different in probing the SATA bus vs. USB this time, and /dev/sdb3 didn’t have the root partition on it.  I ended up rebooting (I don’t know how to get the boot to resume properly if I had managed to mount /root by hand.)

When that worked, I updated the Ubuntu partition’s /boot/grub/custom.cfg to use the UUID instead of the device path for Debian.

It seems that the kernel itself only supports partition UUIDs, but Debian and Ubuntu use initrds (initial RAM disks) that contain the code needed to find the filesystem UUID.  That’s why root=UUID={fs-uuid} has always worked for me!  Including this time.

os-prober (the original source of this entry) has to be more conservative, though, so it put root=/dev/sdb3 on the kernel command line instead.

The Unfixed Race

Sometimes, the wlan0 interface can’t be renamed to wlx{MAC_ADDRESS} because the device is busy.  I tried letting wlan0 be an alias in the configuration for the interface (using systemd-networkd) but it doesn’t seem to take.

I resort to simply rebooting if the login prompt doesn’t reset itself and show a DHCP IP address in the banner within a few seconds.

You have to admire the kernel and systemd teams’ dedication to taking a stable, functional process and replacing it with a complex and fragile mess.

A Brief Flirtation with X11

I installed Cinnamon.  It ran out of space; I ran apt clean and then continued, successfully.  This is my fault; I had “only” given the partition 8 GiB, because I expected it to be a CLI-only environment.

Cinnamon, however, is insistent on NetworkManager, and I was already using systemd-networkd.  It’s very weird to have the desktop showing that there is “no connection” while the internet is actually working fine.

Due to the space issue, I decided to uninstall everything and go back to a minimal CLI.  I would definitely not be able to perform another upgrade to Debian 13, for instance, and it was unclear if I would even be able to do normal updates.

In Conclusion

The Debian upgrade went surprisingly well, considering it was initially installed with debootstrap, and is therefore an unusual configuration.

Losing DNS might have been recoverable by editing /etc/resolv.conf instead, but I wasn’t really in a “fixing this from here is my only option” space.  Actually, one might wonder what happened to the DHCP-provided DNS server?  I don’t know, either.

Trying to add X11 to a partition never designed for it did not work out, but it was largely a whim anyway.