Wednesday, January 5, 2022

A New SSH Key Philosophy

I rolled over my SSH keys this morning; but more than that, I generated two additional keys, so now I have four distinct domains:

  1. Local systems
  2. EC2 systems
  3. AWS CodeCommit (read/write)
  4. AWS CodeCommit (read only) – this one already existed

Previously, I would load “the SSH key” into ssh-agent, and do everything with one key.  CodeCommit would always be accessed in read/write mode; even systems that only needed read-only access for testing would access the read/write key through the forwarded agent.

Because there was only one skeleton key, which I needed frequently, it would be available to any executing code, any time I was working. All hosts it would work on were clearly indicated in .ssh/config and .git/config files. Any code on a guest VM would also be able to access it, through the forwarded agent.  The guest’s configuration also included the hosts, because I once developed from the guest environment.  Back then, the host was Windows, making it more comfortable to access EC2 and git from the guest.

The first two keys I generated represent different frequencies of access. I access the local systems much more frequently than EC2, to debug code under development running on them.  Splitting those keys means that the EC2 key will be unlocked and available much less frequently.

As for CodeCommit, I decided to separate “shell” keys from “git” keys. Our threat model generally gives up on security if an attacker has shell access, so if I can keep local malware away from remote shell access, it adds a layer to the defense.  In addition, this key is also accessed more frequently than the EC2 key.

Finally, I quit forwarding the SSH agent by default to test systems.  They already have the (sufficient) read-only key installed when their base image is created, so all I had to do was “stop changing the configuration” in my personal setup scripts for those guest VMs.  This reduces unnecessary trust (and coupling) between host and guest.

Of note, I am already guarding the AWS keys with aws-vault, which I wrote about previously. I was sufficiently pleased with aws-vault that I’m using it on my home Linux installation, as well.

Overall, it took me an hour or two to work out the plan and reconfigure everything, but I’m happy with the improved security posture.

Saturday, January 1, 2022

Why I use ext4

Re: Why I (still) use ext4 for my Linux root filesystems

I use ext4 for all of my Linux filesystems.  It’s safety in numbers: Files are fraught with peril notes that it is tricky to support different journal modes on ext4, let alone different filesystems.  btrfs may reorder directory operations, which other filesystems don’t.

So, it seems to be safer to make the expected choice.  For me, using Ubuntu, that’s ext4 in ordered mode.

Does it lock me into the decisions of the past?  I don’t know.  The filesystem developers could have “done it wrong,” but ext4 implemented extent-based allocation tracking, reaching feature parity with other common filesystems of the time.  That was probably the whole reason to raise the version number.

The performance story is pretty much a wash.  Whether one filesystem beats another or not, the numbers are typically pretty close.  In what is hopefully closer to real-world usage, I failed to find a difference in VM boot time using ext4 vs xfs on the file system.  If I’m not going to be running filesystem benchmarks as my primary workload, and the benchmark performance doesn’t translate to a real-world effect, then why bother?

I also don’t take snapshots for backups; I’m interested in surviving a complete disk failure. I’ve lost two disks that way, although one had the decency to limp along just long enough to get the important parts of /home copied from it.  Backups are now “I copy important data to a USB disk.”  One of those disks is only rarely connected, for ransomware resistance.

Thursday, December 9, 2021

The Best Tool for the Job?

I've written 3 generations of memcached-to-DynamoDb server now.  That is, a server that speaks the memcached text protocol to its clients, but stores data in Amazon DynamoDb instead of memory.  (Why?)  Perl is dying, so generation 2 was written in Python, using the system version that was already in our base images.  But cultural issues plagued it, so I began thinking about generation 3.  What language to write in?

Perl is still dying.  PHP doesn't have a great async story; I could use stream_select, but I was hoping not to build even more of the server infrastructure myself.  Outside of that, we don't have any other interpreters or runtime environments pre-installed in our image, and I didn't want to bloat it more.

Of the compiled languages, then:

  1. Rust was known to be incomprehensible.
  2. C and C++ are not safe, and don't have package managers.
  3. Go... doesn't have generics?
  4. Nothing else in the category seems to have critical mass (e.g. an AWS SDK ready.)

We have another project in Go that I wrote circa Go 1.4; since then, it required a tiny bit of understandable work for the massive benefit of migrating to modules (and that became devproxy2.  I value stability.)  If you don't want to write a generic library, then Go is fine enough.

Go is still burdened by a self-isolating inner circle, making it faux-open-source at best.  But on the other hand, they have built a safe, concurrent, stable, popular, compiled language, with a standard package manager.  It even has an official AWS SDK.

Friday, August 27, 2021

Our brief use of systemd's TemporaryFileSystem feature

Late last year, we started having problems where services did not appear to get proper data after a reload.  But I have a hypothesis.

We have a monolithic server build, where several services end up running under one Apache instance.  I thought it would be nice to make each one think it was the only service running.  Instead of enumerating badness, I switched to using TemporaryFileSystem= to eliminate all directories, then allowing access the one needed for the service.

The problem is, I don’t think this gets reset/updated during a reload. Deployment renames the working directory, and renames an all-new directory in at the original name.  It’s possible that even though the reload happens, the processes are not changing working directory, thus still looking in the original one.

Things that have never failed were failing; in particular, Template Toolkit was still using stale template files.  I did some digging and found that it does have a cache by default, but it only holds entries for up to a second. Furthermore, because I was very strict about not having unexpected global state, we create a new Template object per request, and shouldn't be sharing the cache.

Anyway, systemd did not actually document what it does with TemporaryFileSystem on reload, so I can’t be sure.  I ended up abandoning TemporaryFileSystem= entirely.  I only have so much time to mess around, and there’s only so much “risk” that would be mitigated by improved isolation. It’s almost guaranteed that the individual applications have defects, and don’t need a multi-app exploit chain.

(Not in the sense that I know of any specific defects; in the sense that all software has defects, and some proportion of those can be leveraged into a security breach.)

Friday, August 20, 2021

More Fun with Linux I/O Stats

I did not disclose boot times or time spent reading with my previous post. This is because the Linux drive is, in fact, on SATA 2 and not SATA 3.  My motherboard only has a single SATA 3 (6 Gbps) port, and Windows was already on it.  I installed a second drive for Linux, which means that the new SSD went on SATA 2 instead.

However, I did temporarily swap the cables, and take some numbers for Linux booting with SATA 3 instead.  This occurred on Ubuntu Studio 21.04, which is the version that I have installed on the bare metal, rather than inside a virtual machine.

The boot time for Ubuntu Studio from Grub-to-desktop was at most 1 second faster on SATA 3, although Linux did report half the time spent reading in /proc/diskstats.

From this, I have to conclude that the disk is not a limiting factor: whether the disk is spending 5.5 or 11 seconds to return data leads to boot times of 13.6 vs 14.5 seconds.  Cutting the bandwidth in half only raises the overall boot time by 6.6%.

In fact, the last boot before I uninstalled VirtualBox, but on SATA 3, recorded 40914 I/O requests, 1958574 sectors, and 10917 ms waiting for the disk, for a 16.0 second boot.  After merely uninstalling VirtualBox and rebooting, there were 16217 requests (60% fewer), 1256806 sectors (36% fewer), 5698 ms waiting (48% less), and 13.8 seconds wall time (14% less).  For what it’s worth, “used memory” reported just after the /proc/diskstats check dropped from 673 MB to 594 MB (12% less).

In other words, the interface is less relevant than whether VirtualBox is installed.

(The virt-manager + Cinnamon experiment was caused by the removal of VirtualBox, but happened after I/O numbers were finished.  Also undisclosed previously: the drive itself is a Samsung 870 EVO 250 GB, so its performance should be fairly close to the limits of SATA 3.)

The conclusion remains the same; more speed on the disk is not going to help my current PC.  My next PC (whether Windows 10 or the hardware gives way first) will support NVMe, but I probably won't use it unless there are a couple of M.2 slots.  Being able to clone a drive has been incredibly useful to me.

Friday, August 13, 2021

Making Sense of PHP HTTP packages

There are, at the moment, several PSRs (PHP Standard Recommendations) involving HTTP:

  1. PSR-7: HTTP messages.  These are the Request, Response, and Uri interfaces.
  2. PSR-17: HTTP message factories.  These are the interfaces for “a thing that creates PSR-7 messages.”
  3. PSR-18: HTTP clients.  These are the interfaces for sending out a PSR-7 request to an HTTP server, and getting a PSR-7 response in return.
  4. PSR-15: HTTP handlers.  These are interfaces for adding “middleware” layers in a server-side framework. The middleware sits “between” application code and the actual Web server, and can process, forward, and/or modify requests and responses.

I will avoid PSR-15 from here on; the inspiration for this post was dealing with the client side of the stack, not the server.

As for the rest, the the two ending in 7 are related: PSR-17 is about creating PSR-7 objects, without the caller knowing the specific PSR-7 objects. However, it’s a kind of recursive problem: how does the caller know the specific PSR-17 object to use?

There's also some confusion caused by having “Guzzle” potentially refer to multiple packages.  There’s the guzzlehttp/guzzle HTTP client, and there’s a separate guzzlehttp/psr7 HTTP message implementation.  Unrelated packages may use guzzle in their name, but not be immediately clear on which exact package they work with.

  • php-http/guzzle7-adapter is related to the client.
  • http-interop/http-factory-guzzle provides PSR-17 factories for the Guzzle PSR-7 classes, and has nothing to do with the client.

Additionally, whether these packages are needed has changed somewhat over time.  guzzlehttp/psr7 version 2 has PSR-17 support built in, and guzzlehttp/guzzle version 7 is compatible with PSR-18.  Previous major versions of these packages lacked those features.

Discovery

The problem of finding PSR-compatible objects is solved by an important non-PSR library from the HTTPlug (HTTP plug) project: php-http/discovery (and its documentation.)

  • It lets code ask for a concrete class by interface, and Discovery takes care of finding which class is available, constructing it, and returning it.
  • It includes its own list of PSR-17 factories and PSR-18 clients, and can return those directly, where applicable. When Guzzle 7 is installed, Discovery (of a recent enough version) can return the actual \GuzzleHttp\Client when asked to find a PSR-18 client.
  • It has additional interfaces for defining and finding asynchronous HTTP clients, where code is not required to wait for the response before processing continues.

At its most basic, Discovery can find PSR-17 factories and PSR-18 HTTP clients.  These would be loaded through the Psr17FactoryDiscovery and Psr18ClientDiscovery classes.  For the more advanced features like asynchronous clients, the additional adapter packages are required.

For example, to use Guzzle 7 asynchronously, php-http/guzzle7-adapter is required.  At that point, it can be loaded using HttpAsyncClientDiscovery::find().  This method then returns an adapter class, which implements php-http's asynchronous client interface, and passes the actual work to Guzzle.

In any case, library code itself would only require php-http/discovery in its composer.json file; a project making use of the library would need to choose the concrete implementation to use in its composer.json file.

An Important Caveat

Discovery happens at run time.  Since Discovery supports a lot of ways to find a number of packages, it doesn't depend on them all, and it doesn't even have a hard dependency on, say, the PSR-17 interfaces themselves.  This means that Composer MAY install packages, even though requirements aren't fully met to make all of them usable.

To be sure the whole thing will actually work in practice, it's important to make some simple, safe HTTP request.  In my case, I use the Mailgun client to fetch recent log events.

When code using Discovery fails, the error message may suggest installing “one of the packages” from a list, and providing a link to Packagist.  That link may include the name of a package that is installed.  Why doesn’t it work? It’s probably the version that is the culprit.  If guzzlehttp/psr7 version 1 is installed, but not http-interop/http-factory-guzzle, then the error is raised because there is genuinely no PSR-17 implementation available with the installed versions. However, the guzzlehttp/psr7 package will be shown on Packagist as providing the necessary support, because the latest version does, indeed, support PSR-17.

Things Have Changed a Bit Over Time

As noted above, prior to widespread support for PSR-17 and PSR-18, using the php-http adapters was crucial for having a functioning stack.  So was installing http-interop/http-factory-guzzle to get a PSR-17 adapter for the guzzlehttp/psr7 version 1 code.

For code relying only on PSR-17 and PSR-18, and using the specific Discovery classes for those PSRs, the latest Guzzle components should not need any other packages installed to work.

However, things can be different, if there is another library in use that causes the older version of guzzlehttp/psr7 to be used.  This happens for me: the AWS SDK for PHP specifically depends on guzzlehttp/psr7 version 1, so I need to include http-interop/http-factory-guzzle as well, for Mailgun to coexist with it.

One Last Time

If you’re writing a library, use and depend on php-http/discovery.

If you’re writing an application, you must also depend on specific components for Discovery to find, such as guzzlehttp/guzzle.  Depending on how the libraries you are using fit together, you may also need http-interop/http-factory-guzzle for an older guzzlehttp/psr7 version, or a package like php-http/guzzle7-adapter if a simple PSR-18 client isn’t suitable.

There are alternatives to Guzzle, but since my projects are frequently AWS-based, and their SDK depends on Guzzle directly, that’s the one I end up having experience with.  I want all of the libraries to share dependencies, where possible.

Friday, August 6, 2021

Cinnamon with Accelerated Graphics in libvirt

While exploring virt-manager and Linux Mint Cinnamon Edition, I kept getting the warning that software rendering was in use (instead of hardware acceleration.) The warning asks to open the Driver Manager, which then says no drivers are needed.

The open-source stack for doing accelerated graphics in this case is SPICE, but virt-manager had configured a different graphics stack by default for the “Ubuntu 20.04” OS setting I had used.

For the guest to use SPICE, it needed the spice-vdagent package to be installed.  After that, the guest was shut down, so that its hardware could be reconfigured.

First, the “video” element needed to be converted from QXL to virtio driver, which includes a 3D acceleration option.  Once the latter option’s checkbox was ticked, I clicked Apply to save the configuration.

Next, the “display” needed to be changed to support direct OpenGL rendering. I needed to change the Listen to none, and to tick the OpenGL checkbox. (Doing this in the opposite order, an “invalid settings” message appeared, saying that OpenGL is not supported unless Listen is none.) This revealed an option to select a graphics device, but I only have one, so I left it alone. I then applied the display changes.

Finally, I booted the guest again.  Everything seems to have worked, and the warning did not appear.