Wednesday, December 28, 2022

Linux Behavior Without Swap

We had a runaway script clog all of the memory on a micro EC2 Ubuntu instance. Not enough that the kernel OOM killer would do anything, and not enough that the script itself hit the PHP memory limit, but enough to make the instance become unresponsive for 45 minutes.

I have sent Linux into thrashing, back in the old days when typical desktop RAM sizes were less than 1 GB and SSDs weren’t available yet.  What surprised me was just how similar “running out of RAM” was in the modern times, even with the OOM killer.  It let the system bog down instead of killing a process!

We chose to mitigate the issue at work by expanding the instance, so that it has more RAM than memory_limit now.  It will take more than one simultaneous runaway script to bring it down in the future.  (We also fixed the script.  I don’t like throwing resources at problems, in general.)

Then one day, via pure serendipity, I found out about earlyoom.  I have added it to our pet instances, and I’m considering it for the cattle template, but it hasn’t been well-tested due to our previous mitigations.  The instance simply doesn’t run out of RAM anymore.

At home, I first set up swap on zram so that Ubuntu Studio would have a place to “swap out” 2+ GB (out of 12 GB installed), and then recently added a swap partition while I was restructuring things anyway.  It’s not great for realtime audio to swap; but “not having swap” doesn’t appear to change the consequences of memory pressure, so I put some swap in.  With a dedicated swap partition added, I reduced the zram area to 512 MB.  I still want to save the SSD if there’s a small-to-moderate amount of swap usage.

UPDATE: This was imperfect, as it turns out; if you have a swap partition, you should remove zram, and use zswap instead.

Tuesday, December 27, 2022

debootstrap’ing a Recovery Partition

One of the nicer things about trying Fedora on my work laptop is that when it broke the boot loader, there was a functioning Recovery Mode.

My desktop relies on a particular driver for WiFi, and upgrading the kernel (e.g. from Ubuntu 22.04 to 22.10) requires fully reinstalling it.  But what if the kernel upgrades to a version that isn’t supported by the copy of the driver I happened to have on disk?  And I didn’t want to “just” (disassemble and move the PC to) plug in an Ethernet cable?

I used the Fedora live environment to make a little bit of room for an 8 GiB partition at the end of the disk (and a 2 GiB swap partition, as long as I was there), and then I ran debootstrap to fill it in.  This is about what surprised me doing that.

tl;dr: debootstrap is a lot more aggressively minimalist, more like Arch, than I would have expected.

Saturday, December 17, 2022

Container Memory Usage

How efficient is it to run multiple containers with the same code, serving different data?  I am most familiar with a “shared virtual hosting” setup, with a few applications behind a single Web frontend and PHP-FPM pool.  How much would I lose, trying to run each app in its own container?

To come up with a first-order approximation of this, I made a pair of minimal static sites and used the nginx:mainline-alpine image (ID 1e415454686a) to serve them.  The overall question was whether the layer would be shared between multiple containers in the Linux memory system, or whether each image would end up with its own copy of everything.

Updated 2022-12-19: This post has been substantially rewritten and expanded, because Ubuntu would not cleanly reproduce the numbers, requiring deeper investigation.

Saturday, December 3, 2022

Failing to Install Fedora 37

Waiting for yet another 80+ MB download of a .deb file, I decided to try to dual-boot Fedora on my Pop!_OS laptop (darp8).  Because drpms exist.  That’s it, that’s the whole reason.

[Update 2022-12-04: Because all resize-related commands use “as much space as possible” by default, I decided to try again, shrinking the filesystem/LV/PV more than was necessary, and then re-expanding them after the partition was changed.  I got Fedora installed, but the system76 extension for the power manager doesn’t work on Gnome 43. Gnome claims it’s not their fault that the system they created frequently breaks extensions in general; I assume they feel the same about the specific case here, where they changed the menu API instantly and completely.]

[Update 2022-12-19: I pretty quickly gave up on Fedora.  It had a tendency to result in the laptop rebooting into recovery mode after using the Fedora install.  Booting Linux is clearly not important enough to get standardized.  Oh well!]

Friday, November 25, 2022

The Devil is in the Details

We had an initial vision for canceling a contract: choose the contract, input an effective date, and click Cancel.

Then users wanted to get a preview of the cancellation, with all fully-calculated refund values.  We had the preview write everything to the database, and if the user confirmed it, we would mark the cancellation as “complete.”  A user could also “revoke” cancellation, which would delete the pending cancellation.

Soon, we had a system to recover abandoned cancellations by reminding users they had one pending, and auto-revoke it after a deadline.

Then it became clear that there could be race conditions.  What if the cancellation is processed between receiving the notification and following the link enclosed?  What if someone else was running the cancellation on the same contract simultaneously, and ended up with the same results?  We needed to show the new status.

What if someone made a mistake with the inputs?  We needed an “Edit” button that would go back, as well.  If there’s an Edit button, and someone comes in from the main menu, should we reload the inputs, or skip the input screen entirely?

It wasn’t long before Accounting wanted a “reinstate” button.  Suddenly, the “final” state was no longer final.  In the mean time, we had integrated automatic data pushes to the actual accounting software, which meant a huge mess if they wanted to reinstate one that was on the official record as canceled.

If there’s a moral to this, it’s that any software pipeline should be fully reversible, even into the parts where “we sent money out, and we have to ask for it back as a consequence.”

Monday, October 17, 2022

systemd User Service Sandboxing

systemd-analyze security --user SERVICE is handy, but inaccurate.  The analysis (on the version bundled with Ubuntu 22.04 “Jammy”) has a few items that are marked as not applicable to user services, but also recommends setting a few items that apparently cannot be set in a user service. Attempting to do so will fail to start the service, with a log that the capabilities could not be set (regardless of the specific option causing the error.)

This unfortunately restricts the following: CapabilityBoundingSet, PrivateDevices, ProtectClock, ProtectKernelLogs, and ProtectKernelModules.

Using ProtectHostname with RestrictNamespaces=true logs a warning every time the service starts up, because the UTS namespace can’t be configured. It’s only a warning, which I think means that ProtectHostname is redundant in this situation, so I took it out of the service.  It’s more important to have clean logs than a good systemd-analyze security score.

The list of capabilities I was able to set were: KeyringMode=private, LockPersonality=true, MemoryDenyWriteExecute=true, NoNewPrivileges=true, PrivateMounts=true, PrivateTmp=true, ProcSubset=pid, ProtectControlGroups=true, ProtectHome=read-only, ProtectKernelTunables=true, ProtectSystem=strict, RestrictNamespaces=true, RestrictRealtime=true, RestrictSUIDSGID=true, SystemCallArchitectures=native, SystemCallFilter=@system-service, UMask=077, and:


Note that this is tailored to the service I was configuring, an HTTP proxy written in Go.  For instance, I wanted to run the binary and load its configuration file from ~/.local, which prevented me from using a stronger setting for ProtectHome.  Likewise, I can’t really turn off networking, because it’s a network program.

Friday, October 14, 2022

An Incomplete API: PSR-7 and PSR-18

Consider some code that is using http-factory-discovery to avoid depending directly on specific PSR-7 and PSR-18 implementations. However, the concrete classes may default to an infinite request timeout, in an environment where external forces (AWS ALB) will time out in 60 seconds. Therefore, in order to return a nicer error message, or get a chance to retry, one wishes to set a timeout like 25 seconds on the request.  Can it be done?

Not really!

A PSR-18 HTTP client has an interface consisting of a single defined method: sendRequest(RequestInterface $request): ResponseInterface.  There is no concept of options here.  The only hope would be to carry them as attributes on the request.

Unfortunately, there are no standard request attributes for HTTP options like this.  PSR-7 defines the attribute mechanism, but is silent about its usage. On the “real world usage” side, Guzzle 7 does not define any way to carry HTTP client options on the Request.

This makes the APIs easier to implement, but leaves a gap in them.  The only way out is to use a specific, concrete implementation.  And then what is the point of the discovery library?  If the application must contain its own code to work with underlying libraries, then all discovery does is add the possibility of returning an unrecognized type.  At that point, the app can only choose between running with degraded features, and crashing entirely.

Wednesday, October 12, 2022

Pop!_OS ‘Unboxing’ Experience

Editor’s Note: This article does not appear to have been published when written, about two weeks into using and customizing Pop!_OS.  This entry contains the original draft, followed by the two-month update.

Initial Thoughts

It took me a while to get settled into Pop!_OS on the new work laptop.  Of course, after leaving a fairly vanilla Ubuntu 10.04 LTS “Lucid Lynx” for Windows in 2012, and using Mac OS X (Mountain Lion through Catalina) at work over roughly the same time period, there was bound to be an adjustment period.  I will focus on bits of the user experience that really stand out as being subpar here; no “oh no, the shortcut is different.”

(For a snapshot of the times: Ubuntu 10.04 would have been using Gnome 2.30, Firefox 3.6, Python 2.6, and Linux 2.6.32.  KDE 4.4 was available, marking Canonical’s first LTS with a KDE 4 version.  It was the first to abandon brown-and-orange as the color theme.  Judging by reviews, the Zune Marketplace still existed at that time, too.)

Perhaps the first thing I noticed was the lack of a maximize button on the windows.  This seemed particularly weird, because the functionality was still present with a double-click in the title bar, or Super+M.  It would be a few days before I would accidentally scroll the settings pane and reveal toggles for both Minimize and Maximize buttons.  The settings window and the hidden scrollbar had conspired to look exactly like a non-scrollable window.

Another peculiarity is the lack of some settings within Settings: both the “Extensions” and “Startup Applications” are completely separate.  Extensions have their own settings pages within that app, except that COSMIC components may not have settings, or may have a settings page with a message that they can be configured from the Desktop section in the Settings app.

Bear in mind that I mostly haven’t used Gnome in a decade, particularly because Gnome 3 upended everything.  (I would later learn that some of this was aping the Mac instead of aping Windows; hurrah for open source “innovation.”) Thus, I’m not entirely sure of what the boundary between Gnome and Pop!_OS’ customizations are, but the end result is rather confusing.  Is this layout logical to someone, or am I looking at a manifestation of Conway’s Law?

Two Month Update

I think it’s Conway’s Law.  It’s rather under-documented which component provides which features, but through experimentally turning Pop Shell off and on, I have learned that it is responsible for:

  1. The good/nice launcher on Super+/
  2. The “focus the window in a direction” shortcuts, Super+(Arrow key)
  3. The window-tiling menu/feature, shown in the system menu area

Getting information on Pop!_OS has been a bit of a problem.  Because it partially customizes Gnome, information on the internet for Gnome may not apply.  Because it is semi-modular, if any of the Pop-specific components are turned off, then information about Pop may also not apply.  Although, that is kind of my fault for not leaving everything on.

However, there’s also a curious benefit to having multiple systems in use: if one breaks, the other may still work.  At some point, launching Flatpaks (which I have put in the system repository) via the Dock started freezing the UI for a bit, not even responding to mouse-move. On the other hand, launching them through Pop Shell’s Super+/ launcher causes no trouble whatsoever.

Having lived at many places on the “up-to-date” vs “actually works” spectrum, and being old and grumpy, my own preference is for something more-working than Pop!_OS, even if it’s less up-to-date.  It might be a while before I go to the effort, though, partly because this is my first EFI/secure boot system, and I don’t want to break it.  Work depends on this hardware.

Sunday, October 9, 2022

Void Linux first impressions

Note: Void Linux may have updated Firefox since this post was originally drafted on 2022-09-25.  The point remains that it was terribly outdated when observed.

I tried Void Linux… twice.  The first time, in VirtualBox, I got DenverCoder’d: the guest additions package was missing, and none of the incantations would bring it out of hiding.  I even copied and pasted the package name to ensure no typos were made.

The next attempt, I put it in virt-manager.  After installing, there was an surprisingly large system update required, which only brought Firefox forward to 91.10.0 ESR.  Today, the entire 91.x branch is no longer being updated (as Firefox 105 and 102.3 ESR have been released), and even “.10” is three releases behind.

This does not bode well for the security of the piece of software that will face the most hostile code.

Beyond that, the keyboard layout was an extreme usability problem.  I haven’t typed on Qwerty as a primary layout in 20 years, and it turns out that I’m losing the ability to switch my brain back to Qwerty at all.  Despite choosing the Dvorak variant in the installer (and the live environment via setxkbmap), the installed system needed some particularly arcane config editing to get Dvorak fully in place.  X11 runs without an Xorg.conf, so one is expected to create /etc/X11/xorg.conf.d and drop a fragment in there, instead, which must be written from scratch.  There were no on-disk examples found.

I will note that I could have set XFCE itself to use Dvorak instead of the system settings, but that would not have altered the login screen.

Despite the XFCE+musl installer being over 800 MB, it did not install an email client.  In fact, it seems like there are only settings, Firefox, and a terminal.  Maybe the installer has great potential for size reduction.

I still don’t know what the point of XBPS is.  It doesn’t seem to be particularly interesting from a user perspective.  The options remind me of pacman but that’s about it.

In any case, like Puppy, I’ve reached the point where I don’t know what to do with the installation next, because I didn’t have plans for it from the start. I just wanted to get a feeling for it.  I know they wrote their own package manager (which looks suspiciously similar to pacman at the UI level), but I don’t really know why. Nothing jumps out as being particularly better/different about it.

Thursday, September 22, 2022

Puppy Linux First Impressions

I’ve used a lot of mainstream systems.  For the past 10 years, my primary desktops have been Mac (work) and Windows (home); before then, I spent over a decade moving through Red Hat 7, FreeBSD 4, Gentoo, Kubuntu 7.04, and regular Ubuntu through 10.04.  I spent memorable chunks of time using fwvm2, WindowMaker, Gnome 1, and KDE 3.  After an experiment with Amazon Linux 1, work standardized on Ubuntu for servers, starting with 14.04.

I say all this to say, I had some warning that Puppy Linux™ was different; there was something about filesystem “layers” and, in theory, being able to build your own like a mixtape. Well, it delivered on being different.

Because first, I had to set up my network.  I found out that a decent chunk of the help is actually online, so I was greeted with Pale Moon telling me “server not found” a few times.  I had the driver files available, but I didn’t have the devx or kernel-sources SFS layers.  I managed to find them on SourceForge, and once the driver installed (quite seamlessly, no-dkms mode), I finally got a network configured with the most-advanced wizard.  It seems that is the only place to be offered to allow WPA2 to be used with the device, since it (or its driver?) isn’t allow-listed already.

I updated Pale Moon, and then, since I was online, why not visit tumblr?  The dash rendered fine, albeit chewing on 100% of a CPU core, but then asking to make a new post crashes (white screen) once the editor loads.  I did not get to announce to the world that I was using Puppy, from Puppy.

(Because my toolchain for Decoded Node posts is on Windows, I am not making this post from Puppy, either.)

The next thing I need to do is figure out what I want to do with this thing. I don’t really have a goal for it yet.

Tuesday, August 30, 2022

Downgrading macOS Catalina to High Sierra

According to Apple Support, my old iMac can be used in Target Display Mode if it’s running High Sierra or earlier. However, I had upgraded it to Catalina.  Getting it back down to High Sierra proved to be quite the challenge.

Part 1: Easy-to-find stuff, cataloged for posterity

I was able to find the installers via Internet search, but after downloading it to the iMac, it wouldn’t run. It claimed that Catalina was too old, which seems like a bad error message for a two-sided version check (Catalina is actually too new, but they didn’t expect they would need to say that.)

However, once again, the Internet came to the rescue: I could use the command- line script bundled with the installer to write it to a USB stick.  It seems that Apple has this answer as well; in my case, it was this:

sudo "/Applications/Install macOS High" --volume /Volumes/BIT_BUCKET

(Any new-lines in the above command should be entered as spaces at the Terminal prompt.)

The command took about five minutes, but during one dead-end, I found it takes 30 minutes to copy to a USB 2.0 stick.

Part 2: The other roadblocks

The High Sierra installer acted like it couldn’t handle the Catalina disks, and did not quite know how to error out properly.  The installer would ask for the password twice, then… would not unlock the disk.  Clicking “Unlock…” to attempt to proceed would then do nothing.  Trying to mount the volumes first, using Disk Utility, didn’t improve the situation.

I tried to erase the volumes using the installer’s Disk Utility, but it hung on “deleting volume.”  Going to choose a startup disk when the volume was hypothetically deleted didn’t show it, but rebooting would prove that the volume was there.  Incidentally, quitting the startup disk selection would remove the menu bar and leave a featureless gray screen.  Oops.

I ran First Aid, turned off FileVault, and even reinstalled Catalina along the way, but none of that seemed to do anything.

What ended up working was to boot into Catalina’s recovery environment, select “show all devices” in its Disk Utility, then erase the entire APFS container and replace it with a “macOS Extended (Journaled)” disk.  After that point, booting the iMac would bring it into a new menu that looked like “choose startup disk” with no disks.

I had to unplug and re-plug the USB stick for it to show up, and then I could boot from it and install High Sierra in the usual manner.  Success!

Part 3: The cable failure

(Section added 2022-09-03.)

I missed the information that Target Display Mode specifically requires a Thunderbolt 2 cable. The Thunderbolt 3 to mini DisplayPort cable I actually bought fits physically, but does not function.

Moreover, the standard was completely revamped for version 3, so it requires an active adapter for Thunderbolt 2, and a specific Thunderbolt 2 cable. Between them, the current price is about $140, while a new monitor can be had for $200.

I have decided to declare the experiment a failure, and abandon the effort to have an external display for the laptop.

The future may eventually hold a 5K display, shared between the work laptop and my personal desktop, once that requires an upgrade.  My current CPU (the third PCIe GPU died years ago, so I quit putting them in) is rated for 2K output over a port that the motherboard doesn't have, or 1920x1200 otherwise.

Bonus chatter

My USB stick was named BIT_BUCKET because, once in a while, the stick would occasionally forget it had data, or even a filesystem.  But, it managed to survive long enough to get the macOS installer running.

Saturday, August 6, 2022

Podman Fully Qualified Image Names

I installed Podman.  Then, of course, I tried to pull an image, but not the example ubuntu image.  This immediately generated an error:

Error: short-name "golang" did not resolve to an alias and no unqualified-search registries are defined in "/etc/containers/registries.conf"

(The “ubuntu” example the tutorial used is actually configured as a short-name in the file /etc/containers/registries.conf.d/shortnames.conf on my system. It’s also possible that later Podman versions will prompt for a repository.)

Luckily, there’s a fairly straightforward way to convert the image names to fully-qualified ones that Podman will accept.

Two basic concepts apply:

  1. Docker images are always on
  2. Docker images without a prefix (as in “golang” or “postgres”) are under library

That is, once we rewrite Docker commands for Podman, they might look like:

podman pull
podman pull

These are the equivalent to pulling postgres and gitea/gitea with Docker.

If you’re looking at an image on Docker Hub, and its URL is something like, then the _ is the “lack of prefix.”  It indicates that the URL for Podman will be

If the URL is, instead,, then that shows the prefix (user gitea) and container (lgtm) after the /r/: the final URL for Podman will be

The final thing to note is that this information applies to the FROM directive in a Dockerfile as well, and it’s compatible with Docker.  Docker and Podman both happily accept FROM

Monday, July 25, 2022

How does macOS Keychain determine which app is using it?

Thinking about the differences between macOS and Gnome keyrings, I began to wonder… how does macOS know what the app making the access is?  It obviously can’t be by the filesystem path, or malware could steal keys by naming itself aws-vault and sneaking into the keychain.

According to this Stack Overflow answer, Apple is using the app’s bundle ID to determine this, authenticated by code signing.

That’s an infrastructure that doesn’t really exist on Linux.  Each distro signs its packages, but the on-disk artifacts are generally neither signed nor checked.  That turns it into a much larger problem for Gnome Keyring to tie secrets to individual apps, because first, we’d need to build a secure way to identify apps.

And then, culturally, Linux users would reject anything resembling a central authority.  Developers (including me) don’t usually bother even using GPG for decentralized code signing, and that’s not even an option with the fashionable curl|bash style installers.

Friday, July 22, 2022

Rescuing an Encrypted Pop!_OS 22.04 Installation

I wanted to access my encrypted Pop!_OS installation from a rescue environment.  I was ultimately successful, and this is how things unfolded.  Any or all of this information may be specific to version 22.04, or even 22.04 as it stands in July 2022.  Nonetheless…

I booted with SystemRescue.  At least as of version 9.03, if you want to change the keymap, read everything first because the UI erases the screen irretrievably (the console does not have scrollback.)

With that out of the way, I was kind of at a loss as to how to proceed.  I could fdisk -l /dev/nvme0n1 and determine that nvme0n1p3 was my root volume.  I tried mounting it with -t auto, to see if that could chain together everything on its own, but it simply failed with an error: mount: unknown filesystem type 'crypto_LUKS'

The command to decrypt it is fairly straightforward.  The final name appears to be arbitrary.  (Although the help/error seemed to indicate that it is optional, a name is definitely required.)

cryptsetup luksOpen /dev/nvme0n1p3 cryptodisk

Now, that created /dev/mapper/cryptodisk, but I couldn’t mount that; it was some sort of LVM type.  Ah.  I searched the web again, and arrived at the commands:

vgchange -ay

These found the volume groups and brought them online, where they could be examined with vgs to get the group names, and lvs to get the volume names. As it turned out, there’s only one of each.  The VG was named data and the volume within it was named root, which makes the final command:

mount -t ext4 /dev/data/root /mnt

Had it not been inside LVM, the vg* commands would have been unnecessary, and I could have mounted it with something like mount -t ext4 /dev/mapper/cryptodisk /mnt (with “cryptodisk” being the name I gave it during the cryptsetup luksOpen command.)

Tuesday, July 19, 2022

Gnome Keyring Security

In the ongoing process of moving to a new work laptop, I have been working to protect secrets from hypothetical malware on the device.  I am using Pop!_OS 22.04 at the moment, examining the default environment: Gnome and gnome-keyring[-daemon].

Using Secrets

I had assumed, given the general focus on security in Linux, that it would be broadly similar to macOS.  To illustrate the user experience there, starting iTerm2 requires the login password to open the password manager for the first time.  The login password is also required to view the passwords through Keychain Access, regardless of whether iTerm2 is running.

I found exactly one Linux terminal with a password manager, Tilix, so of course I installed it. Within Tilix, no password is required to use the password manager.  Using Seahorse (the apparent equivalent to Keychain Access), the passwords Tilix has stored appear in the “login” keyring, unlocked by default, and no password is needed to reveal them.

In short, the keyring in Gnome is glorified plain text.  Access to the session bus is unconditional access to all secrets in all unlocked keyrings.

Who’s There?

Another major difference is that macOS seems to associate keyring entries with owners, such that each individual program gets its own settings in the OS about whether it can access particular secrets.  I can “always allow” aws-vault to access secrets in the aws-vault keyring, but I presume if awsthief tried to access them instead, I would get a new prompt.

Furthermore, if I uncheck “remember this password” on the Mac, it stays unchecked the next time the keyring is unlocked.  In Gnome, for the past 8 years, it re-checks itself every time, waiting for a moment of inattention to make the security of the alternative keyring (awsvault, of course) entirely moot.  It may be locked, but you can have D-Bus fetch the key from under the doormat.

Locking Up

I’m not certain yet whether the Gnome keyring can auto-lock collections, either.  My previous post on macOS’ security command includes how to lock the keyring after a timeout, or when the system is locked.  These capabilities are missing from Seahorse, but I haven’t fully analyzed the D-Bus interface.  (Still, I shouldn’t need to do so.)

Copying Microsoft Good Enough?

A cursory Web search suggests that the way Gnome handles the keyring is exactly like Windows.  Not only is Gnome chasing taillights, but it has chased the easiest ones to catch.

Overall, the quality of Gnome (GNU …) keyring lives up to the heuristic for bad cryptography.

Monday, July 18, 2022

Locking one single keyring in gnome-keyring from the terminal

Updated 2022-07-22: The original code didn't work in practice.  The default for --type changes if --print-reply is given, so it stopped working when I removed the latter after testing.  The command below has been updated to provide explicit values for all options: --type to make it work, and --session to future-proof it.  The original text follows.

I'm moving to a new work laptop, where I wanted to lock the aws-vault keyring when I close a shell/terminal window.  Previously, I did this on macOS, using security lock-keychain aws-vault.keychain-db in my ~/.zlogout file.  (I switched to zsh, then discovered zsh-syntax-highlighting, which is immensely useful.)

So anyway, what's the equivalent CLI command in Pop!_OS 22.04 (derived from Ubuntu 22.04)?

dbus-send --session --type=method_call --dest=org.gnome.keyring \
  /org/freedesktop/secrets \
  org.freedesktop.Secret.Service.Lock \

(Backslash-continuations and line breaks added for readability.  For debugging, adding the --print-reply option may be of use.  Also, the interface can be explored with D-Feet; first, point it to the session bus, then search for secrets, to get the needle out of the haystacks.)

Now, if they ever fix that 8-year-old issue in gnome-keyring, we'll have fully-usable secondary keyrings.  I mean, what is the point to having another keyring, if the option is "No, but ask me again every time?"

Sunday, May 15, 2022

awk can match patterns

The common pattern of grep X | awk '{...}' can sometimes be shortened to awk '/X/ {...}' instead. I'm not super clear on the details of awk vs. grep regex syntax, but relatively basic patterns work unchanged.

For instance, when trying to print the names of packages on the system that have been removed but left their config files behind (they are in "residual config" state), this snippet embeds the subcommand:

dpkg -l | grep '^rc' | awk '{print $2}'

Moving the pattern into awk to avoid running grep, we get:

dpkg -l | awk '/^rc/ {print $2}'

I know I shouldn't worry about system calls and "extra" processes being forked, but some old habits die hard.

Tuesday, May 3, 2022

Ubuntu Paper Cuts: 22.04

I installed Ubuntu 22.04 “Jammy Jellyfish” as a guest in VirtualBox.  Some observations on the installer itself:

  1. The default BIOS boot yields an 800x600 screen that does not fit the installer’s window. The progress bar is hidden from view.
  2. EFI boot yields a better 1024x768 screen, but the graphics updating is very bad. (This could be VirtualBox’s fault—it looks a lot like the VirtualBox UI when it forgets to draw stuff until mouseover. Also, the resulting desktop in EFI works fine.)
  3. The so-called minimal installation produces a full installation, then removes packages with the package manager, bloating the size of the VDI file backing the VM and placing unnecessary wear on the physical medium.
  4. Even normal installation installs “everything,” then removes the other language packs.

Booting into the system reveals the following:

  1. [edited to add] The loading animation disappears fairly early during EFI boot, leaving only two static logos (Ubuntu's, and one from the firmware) showing.
  2. Asking the installer to download updates didn’t avoid having “Updates Available” immediately appear on the desktop.
  3. Installing the updates didn’t avoid having Software show a dot (for 3 more Snap-related packages) on its Updates tab immediately thereafter.
  4. Following that, Software had yet another dot on the Updates tab, but no updates.
  5. Software doesn’t have an icon for the Health and Fitness category; it looks more like the “Windows default EXE” icon than anything.
  6. The main page of Software had, as its top/largest app, an ad for Slack, in which some of the black text of the description overlapped the dark purple background of the Slack icon, rendering it partially unreadable.

All this, and I hadn’t even done anything yet.

Sunday, April 24, 2022


The currently published snap for ripgrep is version 12.1.0; the snap was released 2020-05-17.  The current version, 13.0.0, was released on 2021-06-12. There’s no contact for the publisher of the snap.

The currently published snap for alacritty is version 0.8.0, with the snap released 2021-06-17.  Meanwhile, the current version is 0.10.1; the snap packagers (the “Snapcrafters”) haven’t packaged 0.9.0 released 2021-08-03 or any later version.  The contact information listed for this snap is a Github repository that doesn’t allow issues to be reported; indeed, the contact is the pull request page.  [Updated 2022-07-29: fixed a typo in the snap release date.  Verified that neither snap has been updated since publishing.]

The snap store itself only recognizes legal reasons to “report” a snap, due to violations of either copyright/trademarks, or the terms of the Snap store. There’s no way to signal to the publishers that an update is desired.

It’s clear through their actions where Canonical stands, here: once again, they are interested in taking the work of the FOSS community and siloing it in a place where Canonical can profit, and nobody else can.

The real issue isn’t about snap itself; it’s Canonical’s core strategy. Consider their other initiatives, like upstart, Unity, and Mir.  They wanted to become owners of critical infrastructure, walled off from the whims of the community by a licensing agreement that gave Canonical—especially and only Canonical—the right to profit off the works of others which they did not pay for.

With snaps, Canonical attempts to further erode the community, by letting anyone take over any project’s name, publish under it, and (regardless of intentions) leave it to rot.  Then, they don’t even want to clean up the results, unless they are legally bound to do so.  This helps with the number of “available snap packages,” to be sure, but sinks the utility of the whole system.  Users can’t place their trust in it.

It appears that I may need to shift distributions again, to something that is neither Ubuntu nor downstream from it.

Sunday, April 3, 2022

Research in progress: SSH

Part 1: Certificates

I read up on how to use ssh certificates.  I still ran into a couple of surprises:

My known_hosts file is hashed.  The easiest way to figure out the entry for the server is to connect with ssh -v and look for the message where the host key was found "in known_hosts:4".  That means line 4 of the file.  But, it also includes the whole key for lookup, such as '[]:222' for sshd listening on a non-standard port. With that, you can use the official command instead of counting line numbers:

ssh-keygen -R '[]:222'

Another part I did not understand going into this, is that the certificate doesn't replace the keypair.  It certifies the keypair.  The keypair itself is still used, and it must be available to the client.  Having been in computing for so long, it's odd for a word to have its actual English meaning.

As far as I can tell for creating the certificates, everyone is on their own for building a signing infrastructure.  Which is why companies like smallstep or teleport will provide that piece, for a fee.

Part 2: Multiplexing

I finally got around to looking at sslh a bit, but it didn't exactly work.

Whether I start sslh or nginx first, the other thinks the address is already in use.  But starting two netcat processes, one on and one on, seems to work fine.  I could use nginx's stream module on my personal site, but the day job uses Apache, so I'm not sure how transferable all this is.

It would be nice to get it going from a "neat trick!" perspective, but it's not entirely necessary.  Mainly, it would let us access our git repositories from corporate headquarters without having to request opening the firewall, but I've been there a total of one time.

Updated 2022-04-24: I ultimately chose to set up another hostname, using proxytunnel to connect, and having Apache terminate TLS and pass the CONNECT request on to the SSH server. This further hides the SSH traffic behind a legitimate TLS session, in case it's not just port-blocking that we're facing.

Wednesday, March 23, 2022

Are containers light weight?

I read a thing claiming that containers are “light weight.”  But that’s only compared to a hardware virtual machine!  Containers seem light only through the historical accident of their path to popularity.  They are nearly at the end-point of heavyweight distribution methods.

Once upon a time, we programmers were able to handle a bit of version skew. We’d use libraries like GTK+ which maintained backward compatibility—at the ABI level, even—so that code compiled against 2.4.x would run against 2.4.x or later 2.x releases, without changes.  We’d install something like Smarty to the global PHP include path, and use a single copy of it from all our projects, for space efficiency.  Nothing was vendored!

(We could semi-vendor things in scripting languages by playing with the include path.  Install a major upgrade to, say, “lib-v4”, then in the v4-aware application, prepend the “lib-v4” directory to the include path at runtime. When all the applications were converted, remove the old version from the global path, move the v4 code there, and remove the include-path code from the apps again.  It’s a lot like gradually updating a database column.  It wasn’t a great approach for C code, though.)

Portability across operating systems, even in “POSIX” land, was a mess, but we all learned how to do it.  Virtually all open-source code dealt with it, so we had plenty of examples, and we largely respected the user’s choice of platform to run on.  Even if it were Solaris…

This also produced a pressure for minimal dependencies; the less we required of a user, then the more likely they were to run our code.  I still think that Java largely failed on Linux because every user had to fetch the JRE from Sun’s atrocious website themselves.  (Blackdown and later OpenJDK would change this, long after the ship had sailed.  The Apache Foundation’s Java-based projects are a notable exception from the general attitude, but they are also not desktop software.)

Today’s environment is the complete antithesis.  We pack entire OS distributions, possibly a language interpreter, all of our libraries, and our application code into a gigabyte-plus wrapping (partially shared, but still a minimum).  Then, we call it “lightweight” because it doesn’t have a guest kernel in there.

The old times weren’t perfect; it was an incredibly painful experience to make Linux binaries that worked across distributions, because of variance in the filesystem layout and the need to rely on old libraries to cover the older systems people might run the binary on.  And sometimes, there was no choice but to make multiple builds, because distributions might only package one of incompatible library versions. But largely, to support an app of a few megs, we shipped a few megs, not a few hundred, and we certainly didn’t call “a near-complete disk image” lightweight.

Saturday, March 19, 2022

Elliptic-curve SSH key types

I meant to write a detailed post about the differences between ECDSA and Ed25519, but it became rather short:

Don’t use ECDSA.

It’s not fast, especially if implemented securely. ECDSA and the elliptic curves that it uses are provided by the NSA.  Standardized in 2000, ECDSA is basically 1990s crypto, which we should be aggressively replacing/upgrading.

[Updated 2023-01-28: I believe there are now improved functions for these curves that don't have "points at infinity," which had been the major cause of performance/safety problems.  However, with Ed25519, there's no need to dive so deep into the implementation to determine its characteristics.]

Ed25519 is a later elliptic-curve algorithm, designed to avoid all known or suspected problems of ECDSA and the NIST curves.  It was published in 2011. As far as I know—which, admittedly, is primarily from the author’s own site about ECC security—there are no new attacks that weaken Ed25519, nor the underlying Curve25519 specifically.

Friday, March 18, 2022

The Unexpected Thoroughness of `tasksel remove`

I decided to cut down a little on the number of packages installed on my Ubuntu Studio 21.10 system, so I tried running a command.  I'm pretty sure I asked for removal of ubuntustudio-video only, but I don't really know... whatever I ran proceeded to thoroughly remove most of the ubuntustudio meta packages, including ubuntustudio-audio—the prime reason for the system to exist—and even ubuntustudio-desktop itself.  Other notable packages caught in the crossfire were sddm and dkms.

Of course, as soon as I saw ardour6 being uninstalled early on, I mashed Ctrl+C, but it has no effect on tasksel.  Rude.

The damage to sddm was fairly simple and obvious: the display went black mid-process.  It took down the desktop session and Konsole with it, but tasksel continued on its path of complete destruction.  The loss of dkms is much more concerning; had I not noticed, at some point, I would have rebooted into a new kernel, and then I wouldn't have had WiFi anymore, with no idea why.

I had carefully requested a test run from tasksel first, except that it didn't actually list out packages, just a vague apt command in a format I'd never seen before.  That failed to adequately prepare me for what really happened.

(I got everything back using the text console.  Except I didn't bother with ubuntustudio-video.  Mission successful?)

Wednesday, February 9, 2022

The Pace of Change

I’m not the first, nor the only, person to complain about the pace of technical change.  But what are the actual problems?

We risk losing perspective.  We will forget that the fad of today is just another fad; blockchains and containers are destined to be the next XML, relatively soon in their life, then carried forward for thirty years because changing infrastructure is too risky for the business.

We risk losing the wisdom of the past, assuming even our own younger selves were but naked savages, coding in Perl or PHP. We will not know what made Perl, Perl; we will not bring any of the good ideas forward.

Truly, we risk losing experts. It took me a good 10 or 15 years to really appreciate the sheer amount of knowledge that makes an expert, an expert; if we burn our world down every five years, then we will never come to know anything deeply.  We will have no experts.

Where I used to worry about becoming “a dinosaur,” it now seems that dinosaurs going extinct are the larger problem.

But what is the actual problem?

Pride, perhaps?  Are we too snobby to learn about what came before, to understand our place in history, and to meet the present where it’s at?  Do we think there is nothing to learn about a system in reading its code, in making improvements to it, that we must replace it outright?

Is it ignorance?  Or is it the deep, white-guy need to fall into the pit himself, before he believes it to be there?  Do we really believe that it was jQuery that created the spaghetti, and not ourselves?  Will abandoning one library for another genuinely improve our own capabilities… or is it a convenient deflection?

I am inclined to shout, “just deal with it!” at people.  They probably want to shout it back to me.

Wednesday, January 26, 2022

Amazon CloudSearch Security Policies

I have been looking into CloudSearch to introduce a "search everything" feature to my employer's core app.  The interaction of user and resource policies was a little bit confusing, but I think it works as follows.

A CloudSearch resource policy is needed to allow access outside the owner's account, or to restrict access by IP address.  A blank CloudSearch policy is a perfectly functional option for the owner.  Although the UI says, "this will only allow access through the console," it actually means that only access policies set on the IAM user/role making the request are relevant.  "The console" just happens to be running as the logged-in user, with those IAM permissions.

As I understand it, once CloudSearch is accessed, the permission checks proceed along these lines:

  1. Does the CloudSearch resource policy allow the account?  If there's no policy, only the owner is permitted; otherwise, the policy is used to determine whether cross-account or anonymous access is permitted.  (Caution: it's not clear to me whether a policy that specifies cross-account access, but doesn't include the owner, will deny the owner.)  If the account is not permitted, then the request is denied.
  2. Is there an IP restriction?  If so, and the caller's IP is not permitted, then the request fails.  If there is no IP restriction, then the connection is permitted.
  3. Does the caller have permission through IAM (user or role) in their account to make this request?  If there's no explicit grant, then the request is, once again, denied.

Putting my own AWS account ID in as the allowed account, with no IP address restrictions, did not seem to be sufficient to grant access.  When I gave my instance access through an IAM role, no CloudSearch resource policy was necessary to allow access the domain.

The documentation notes that IP address restrictions must be given in the CloudSearch resource policy.  I believe this arises because the IP can only be checked once a connection is made to CloudSearch.  Prior to that point, the caller's IP address is not available for checking.

Likewise, if I understand the documentation properly, cross-account access needs both a CloudSearch resource policy set to allow access from the caller's AWS account, and the caller's IAM also needs to allow access to the CloudSearch domain.  However, we only have the one account, so I haven't fully tested this scenario.

Wednesday, January 5, 2022

A New SSH Key Philosophy

I rolled over my SSH keys this morning; but more than that, I generated two additional keys, so now I have four distinct domains:

  1. Local systems
  2. EC2 systems
  3. AWS CodeCommit (read/write)
  4. AWS CodeCommit (read only) – this one already existed

Previously, I would load “the SSH key” into ssh-agent, and do everything with one key.  CodeCommit would always be accessed in read/write mode; even systems that only needed read-only access for testing would access the read/write key through the forwarded agent.

Because there was only one skeleton key, which I needed frequently, it would be available to any executing code, any time I was working. All hosts it would work on were clearly indicated in .ssh/config and .git/config files. Any code on a guest VM would also be able to access it, through the forwarded agent.  The guest’s configuration also included the hosts, because I once developed from the guest environment.  Back then, the host was Windows, making it more comfortable to access EC2 and git from the guest.

The first two keys I generated represent different frequencies of access. I access the local systems much more frequently than EC2, to debug code under development running on them.  Splitting those keys means that the EC2 key will be unlocked and available much less frequently.

As for CodeCommit, I decided to separate “shell” keys from “git” keys. Our threat model generally gives up on security if an attacker has shell access, so if I can keep local malware away from remote shell access, it adds a layer to the defense.  In addition, this key is also accessed more frequently than the EC2 key.

Finally, I quit forwarding the SSH agent by default to test systems.  They already have the (sufficient) read-only key installed when their base image is created, so all I had to do was “stop changing the configuration” in my personal setup scripts for those guest VMs.  This reduces unnecessary trust (and coupling) between host and guest.

Of note, I am already guarding the AWS keys with aws-vault, which I wrote about previously. I was sufficiently pleased with aws-vault that I’m using it on my home Linux installation, as well.

Overall, it took me an hour or two to work out the plan and reconfigure everything, but I’m happy with the improved security posture.

Saturday, January 1, 2022

Why I use ext4

Re: Why I (still) use ext4 for my Linux root filesystems

I use ext4 for all of my Linux filesystems.  It’s safety in numbers: Files are fraught with peril notes that it is tricky to support different journal modes on ext4, let alone different filesystems.  btrfs may reorder directory operations, which other filesystems don’t.

So, it seems to be safer to make the expected choice.  For me, using Ubuntu, that’s ext4 in ordered mode.

Does it lock me into the decisions of the past?  I don’t know.  The filesystem developers could have “done it wrong,” but ext4 implemented extent-based allocation tracking, reaching feature parity with other common filesystems of the time.  That was probably the whole reason to raise the version number.

The performance story is pretty much a wash.  Whether one filesystem beats another or not, the numbers are typically pretty close.  In what is hopefully closer to real-world usage, I failed to find a difference in VM boot time using ext4 vs xfs on the file system.  If I’m not going to be running filesystem benchmarks as my primary workload, and the benchmark performance doesn’t translate to a real-world effect, then why bother?

I also don’t take snapshots for backups; I’m interested in surviving a complete disk failure. I’ve lost two disks that way, although one had the decency to limp along just long enough to get the important parts of /home copied from it.  Backups are now “I copy important data to a USB disk.”  One of those disks is only rarely connected, for ransomware resistance.