Wednesday, March 23, 2022

Are containers light weight?

I read a thing claiming that containers are “light weight.”  But that’s only compared to a hardware virtual machine!  Containers seem light only through the historical accident of their path to popularity.  They are nearly at the end-point of heavyweight distribution methods.

Once upon a time, we programmers were able to handle a bit of version skew. We’d use libraries like GTK+ which maintained backward compatibility—at the ABI level, even—so that code compiled against 2.4.x would run against 2.4.x or later 2.x releases, without changes.  We’d install something like Smarty to the global PHP include path, and use a single copy of it from all our projects, for space efficiency.  Nothing was vendored!

(We could semi-vendor things in scripting languages by playing with the include path.  Install a major upgrade to, say, “lib-v4”, then in the v4-aware application, prepend the “lib-v4” directory to the include path at runtime. When all the applications were converted, remove the old version from the global path, move the v4 code there, and remove the include-path code from the apps again.  It’s a lot like gradually updating a database column.  It wasn’t a great approach for C code, though.)

Portability across operating systems, even in “POSIX” land, was a mess, but we all learned how to do it.  Virtually all open-source code dealt with it, so we had plenty of examples, and we largely respected the user’s choice of platform to run on.  Even if it were Solaris…

This also produced a pressure for minimal dependencies; the less we required of a user, then the more likely they were to run our code.  I still think that Java largely failed on Linux because every user had to fetch the JRE from Sun’s atrocious website themselves.  (Blackdown and later OpenJDK would change this, long after the ship had sailed.  The Apache Foundation’s Java-based projects are a notable exception from the general attitude, but they are also not desktop software.)

Today’s environment is the complete antithesis.  We pack entire OS distributions, possibly a language interpreter, all of our libraries, and our application code into a gigabyte-plus wrapping (partially shared, but still a minimum).  Then, we call it “lightweight” because it doesn’t have a guest kernel in there.

The old times weren’t perfect; it was an incredibly painful experience to make Linux binaries that worked across distributions, because of variance in the filesystem layout and the need to rely on old libraries to cover the older systems people might run the binary on.  And sometimes, there was no choice but to make multiple builds, because distributions might only package one of incompatible library versions. But largely, to support an app of a few megs, we shipped a few megs, not a few hundred, and we certainly didn’t call “a near-complete disk image” lightweight.

Saturday, March 19, 2022

Elliptic-curve SSH key types

I meant to write a detailed post about the differences between ECDSA and Ed25519, but it became rather short:

Don’t use ECDSA.

It’s not fast, especially if implemented securely. ECDSA and the elliptic curves that it uses are provided by the NSA.  Standardized in 2000, ECDSA is basically 1990s crypto, which we should be aggressively replacing/upgrading.

[Updated 2023-01-28: I believe there are now improved functions for these curves that don't have "points at infinity," which had been the major cause of performance/safety problems.  However, with Ed25519, there's no need to dive so deep into the implementation to determine its characteristics.]

Ed25519 is a later elliptic-curve algorithm, designed to avoid all known or suspected problems of ECDSA and the NIST curves.  It was published in 2011. As far as I know—which, admittedly, is primarily from the author’s own site about ECC security—there are no new attacks that weaken Ed25519, nor the underlying Curve25519 specifically.

Friday, March 18, 2022

The Unexpected Thoroughness of `tasksel remove`

I decided to cut down a little on the number of packages installed on my Ubuntu Studio 21.10 system, so I tried running a command.  I'm pretty sure I asked for removal of ubuntustudio-video only, but I don't really know... whatever I ran proceeded to thoroughly remove most of the ubuntustudio meta packages, including ubuntustudio-audio—the prime reason for the system to exist—and even ubuntustudio-desktop itself.  Other notable packages caught in the crossfire were sddm and dkms.

Of course, as soon as I saw ardour6 being uninstalled early on, I mashed Ctrl+C, but it has no effect on tasksel.  Rude.

The damage to sddm was fairly simple and obvious: the display went black mid-process.  It took down the desktop session and Konsole with it, but tasksel continued on its path of complete destruction.  The loss of dkms is much more concerning; had I not noticed, at some point, I would have rebooted into a new kernel, and then I wouldn't have had WiFi anymore, with no idea why.

I had carefully requested a test run from tasksel first, except that it didn't actually list out packages, just a vague apt command in a format I'd never seen before.  That failed to adequately prepare me for what really happened.

(I got everything back using the text console.  Except I didn't bother with ubuntustudio-video.  Mission successful?)

Wednesday, February 9, 2022

The Pace of Change

I’m not the first, nor the only, person to complain about the pace of technical change.  But what are the actual problems?

We risk losing perspective.  We will forget that the fad of today is just another fad; blockchains and containers are destined to be the next XML, relatively soon in their life, then carried forward for thirty years because changing infrastructure is too risky for the business.

We risk losing the wisdom of the past, assuming even our own younger selves were but naked savages, coding in Perl or PHP. We will not know what made Perl, Perl; we will not bring any of the good ideas forward.

Truly, we risk losing experts. It took me a good 10 or 15 years to really appreciate the sheer amount of knowledge that makes an expert, an expert; if we burn our world down every five years, then we will never come to know anything deeply.  We will have no experts.

Where I used to worry about becoming “a dinosaur,” it now seems that dinosaurs going extinct are the larger problem.

But what is the actual problem?

Pride, perhaps?  Are we too snobby to learn about what came before, to understand our place in history, and to meet the present where it’s at?  Do we think there is nothing to learn about a system in reading its code, in making improvements to it, that we must replace it outright?

Is it ignorance?  Or is it the deep, white-guy need to fall into the pit himself, before he believes it to be there?  Do we really believe that it was jQuery that created the spaghetti, and not ourselves?  Will abandoning one library for another genuinely improve our own capabilities… or is it a convenient deflection?

I am inclined to shout, “just deal with it!” at people.  They probably want to shout it back to me.

Wednesday, January 26, 2022

Amazon CloudSearch Security Policies

I have been looking into CloudSearch to introduce a "search everything" feature to my employer's core app.  The interaction of user and resource policies was a little bit confusing, but I think it works as follows.

A CloudSearch resource policy is needed to allow access outside the owner's account, or to restrict access by IP address.  A blank CloudSearch policy is a perfectly functional option for the owner.  Although the UI says, "this will only allow access through the console," it actually means that only access policies set on the IAM user/role making the request are relevant.  "The console" just happens to be running as the logged-in user, with those IAM permissions.

As I understand it, once CloudSearch is accessed, the permission checks proceed along these lines:

  1. Does the CloudSearch resource policy allow the account?  If there's no policy, only the owner is permitted; otherwise, the policy is used to determine whether cross-account or anonymous access is permitted.  (Caution: it's not clear to me whether a policy that specifies cross-account access, but doesn't include the owner, will deny the owner.)  If the account is not permitted, then the request is denied.
  2. Is there an IP restriction?  If so, and the caller's IP is not permitted, then the request fails.  If there is no IP restriction, then the connection is permitted.
  3. Does the caller have permission through IAM (user or role) in their account to make this request?  If there's no explicit grant, then the request is, once again, denied.

Putting my own AWS account ID in as the allowed account, with no IP address restrictions, did not seem to be sufficient to grant access.  When I gave my instance access through an IAM role, no CloudSearch resource policy was necessary to allow access the domain.

The documentation notes that IP address restrictions must be given in the CloudSearch resource policy.  I believe this arises because the IP can only be checked once a connection is made to CloudSearch.  Prior to that point, the caller's IP address is not available for checking.

Likewise, if I understand the documentation properly, cross-account access needs both a CloudSearch resource policy set to allow access from the caller's AWS account, and the caller's IAM also needs to allow access to the CloudSearch domain.  However, we only have the one account, so I haven't fully tested this scenario.

Wednesday, January 5, 2022

A New SSH Key Philosophy

I rolled over my SSH keys this morning; but more than that, I generated two additional keys, so now I have four distinct domains:

  1. Local systems
  2. EC2 systems
  3. AWS CodeCommit (read/write)
  4. AWS CodeCommit (read only) – this one already existed

Previously, I would load “the SSH key” into ssh-agent, and do everything with one key.  CodeCommit would always be accessed in read/write mode; even systems that only needed read-only access for testing would access the read/write key through the forwarded agent.

Because there was only one skeleton key, which I needed frequently, it would be available to any executing code, any time I was working. All hosts it would work on were clearly indicated in .ssh/config and .git/config files. Any code on a guest VM would also be able to access it, through the forwarded agent.  The guest’s configuration also included the hosts, because I once developed from the guest environment.  Back then, the host was Windows, making it more comfortable to access EC2 and git from the guest.

The first two keys I generated represent different frequencies of access. I access the local systems much more frequently than EC2, to debug code under development running on them.  Splitting those keys means that the EC2 key will be unlocked and available much less frequently.

As for CodeCommit, I decided to separate “shell” keys from “git” keys. Our threat model generally gives up on security if an attacker has shell access, so if I can keep local malware away from remote shell access, it adds a layer to the defense.  In addition, this key is also accessed more frequently than the EC2 key.

Finally, I quit forwarding the SSH agent by default to test systems.  They already have the (sufficient) read-only key installed when their base image is created, so all I had to do was “stop changing the configuration” in my personal setup scripts for those guest VMs.  This reduces unnecessary trust (and coupling) between host and guest.

Of note, I am already guarding the AWS keys with aws-vault, which I wrote about previously. I was sufficiently pleased with aws-vault that I’m using it on my home Linux installation, as well.

Overall, it took me an hour or two to work out the plan and reconfigure everything, but I’m happy with the improved security posture.

Saturday, January 1, 2022

Why I use ext4

Re: Why I (still) use ext4 for my Linux root filesystems

I use ext4 for all of my Linux filesystems.  It’s safety in numbers: Files are fraught with peril notes that it is tricky to support different journal modes on ext4, let alone different filesystems.  btrfs may reorder directory operations, which other filesystems don’t.

So, it seems to be safer to make the expected choice.  For me, using Ubuntu, that’s ext4 in ordered mode.

Does it lock me into the decisions of the past?  I don’t know.  The filesystem developers could have “done it wrong,” but ext4 implemented extent-based allocation tracking, reaching feature parity with other common filesystems of the time.  That was probably the whole reason to raise the version number.

The performance story is pretty much a wash.  Whether one filesystem beats another or not, the numbers are typically pretty close.  In what is hopefully closer to real-world usage, I failed to find a difference in VM boot time using ext4 vs xfs on the file system.  If I’m not going to be running filesystem benchmarks as my primary workload, and the benchmark performance doesn’t translate to a real-world effect, then why bother?

I also don’t take snapshots for backups; I’m interested in surviving a complete disk failure. I’ve lost two disks that way, although one had the decency to limp along just long enough to get the important parts of /home copied from it.  Backups are now “I copy important data to a USB disk.”  One of those disks is only rarely connected, for ransomware resistance.