Sunday, August 6, 2023

Sound for Firefox Flatpak on Ubuntu 23.04 with PipeWire

I reinstalled Ubuntu Studio recently, excised all Snaps, and installed Firefox from Flatpak.  Afterward, I didn’t have any audio output in Firefox.  Videos would just freeze.

I don’t know how much of this is fully necessary, but I quit Firefox, installed more PipeWire pieces, and maybe signed out and/or rebooted.

sudo apt install pipewire-audio pipewire-alsa

The pipewire-jack and pipewire-pulse packages were already installed.

AIUI, this means that “PipeWire exclusively owns the audio hardware” and provides ALSA, JACK, Pulse, and PipeWire interfaces into it.

It’s not perfect.  Thunderbird (also flatpak; I’d rather have “some” sandbox than “none”) makes a bit of cacophony when emails come in, but at least there’s sound for Firefox.

Tuesday, July 4, 2023

Boring Code Survives

Over on Wandering Thoughts, Chris writes about some fileserver management tools being fairly unchanged over time by changes to the environment.  There is a Python 2 to 3 conversion, and some changes when the disks being managed are no longer on iSCSI, “but in practice a lot of code really has carried on basically as-is.”

This is completely different than my experience with async/await in Python.  Async was new, so the library I used with it was in 0.x, and in 1.0, the authors inverted the entire control structure. Instead of being able to create an AWS client deep in the stack and return it upwards, clients could only be used as context managers.  It was quite a nasty surprise.

To allow testing for free, my code dynamically instantiated a module to “manage storage,” and whether that was AWS or in-memory was an implementation detail.  Suddenly, one of the clients couldn’t write self.client = c; return anymore.  The top-level had to know about the change.  Other storage clients would have to know about the change, to become context managers themselves, for no reason.

I held onto the 0.x version for a while, until the Python core team felt like “explicit event loop” was a mistake big enough that everyone’s code had to be broken.

Async had been hard to write in the first place, because so much example code out there was for the asyncio module’s decorators, which had preceded the actual async/await syntax.  What the difference between tasks and coroutines even was, and why one should choose one over the other, was never clear.  Why an explicit loop parameter should exist was especially unclear, but it was “best practice” to include it everywhere, so everyone did.  Then Python set it on fire.

(I never liked the Python packaging story, and pipenv didn’t solve it. To pipenv, every Python minor version is an incompatible version?)

I had a rewrite on my hands either way, so I went looking for something else to rewrite in, and v3 is in Go.  The other Python I was using in my VM build pipeline was replaced with a half-dozen lines of shell script.  It’s much less flexible, perhaps, but it’s clear and concise now.

In the end, it seems that boring code survives the changing seasons.  If you’re just making function calls and doing some regular expression work… there’s little that’s likely to change in that space.  If you’re coloring functions and people are inventing brand-new libraries in the space you’re working in, your code will find its environment altered much sooner.  The newer, fancier stuff is inherently closer to the fault-line of future shifts in the language semantics.

Sunday, June 25, 2023

Installing Debian 12 and morrownr's Realtek driver (2023-07-03 edit)

Due to deepening dissatisfaction with Canonical, I replaced my Ubuntu Studio installation with Debian 12 “bookworm” recently.

tl;dr:

  1. My backups, including the driver source, were compressed with lzip for reasons, but I fell back on a previously-built rescue partition to get the system online.
  2. I ended up with an improper grub installation, that couldn’t find /boot/grub/i386-pc/normal.mod. I rebooted the install media in rescue mode, got a shell in the environment, verified the disk structure with fdisk -l, and then ran grub-install /dev/___ to fix it. Replace the blank with your device, but beware: using the wrong device may make the OS on it unbootable.
  3. The USB doesn’t work directly with apt-cdrom to install more packages offline. I “got the ISO back” from dd if=/dev/sdd of=./bookworm.iso bs=1M count=4096 status=progress conv=fsync (1M * 4096 = 4G total, which is big enough for the 3.7G bookworm image; you may need to adjust to suit), then made it available with mount -o loop,ro ~+/bookworm.iso /media/cdrom0 (the mount point is the target of /media/cdrom.)
  4. Once finished, I found out the DVD had plzip, and if I’d searched for it (lzip), I could have used it (plzip). I didn’t actually need the rescue partition.
  5. Once finished, I realized I hadn’t needed to dd the ISO back from the USB stick. The downloaded ISO was on my external drive all along, and I could have loop-mounted that.
  6. [Added 2023-07-02]: Letting the swap partition get formatted gave it a new UUID. Ultimately, I would need to update the recovery partition’s /etc/fstab with the new UUID, and follow up with update-initramfs -u to get the recovery partition working smoothly again.

Full, detailed rambling with too much context (as usual) below.

Friday, March 31, 2023

Passing data from AWS EventBridge Scheduler to Lambda

The documentation was lacking images, or even descriptions of some screens ("Choose Next. Choose Next.")  So, I ran a little experiment to test things out.

When creating a new scheduled event in AWS EventBridge Scheduler, then choosing AWS Lambda: Invoke, a field called "Input" will be available.  It's pre-filled with the value of {}, that is, an empty JSON object. This is the value that is passed to the event argument of the Lambda handler:

export async function handler(event, context) {
  // handle the event
}

With an event JSON of {"example":{"one":1,"two":2}}, the handler could read event.example.two to get its value, 2.

It appears that EventBridge Scheduler allows one complete control over this data, and the context argument is only filled with Lambda-related information.  Therefore, AWS provides the ability to include the <aws.scheduler.*> values in this JSON data, to be passed to Lambda (or ignored) as one sees fit, rather than imposing any constraints of its own on the data format.  (Sorry, no examples; I was only testing the basic features.)

Note that the handler example above is written with ES Modules.  This requires the Node 18.x runtime in Lambda, along with a filename of "index.mjs".

Monday, March 27, 2023

DejaDup and PikaBackup, early impressions (Update 2)

I tried a couple of backup programs:

I installed both of them as Flatpaks, although deja-dup also has a version in the Pop!_OS 22.04 repository.  I have been using DejaDup for four months, and PikaBackup for one month.  This has been long enough for DejaDup to make a second full backup, but not so long for Pika to do anything special.

Speed:

For a weekly incremental backup of my data set…

  • DejaDup: about 5 minutes, lots of fan speed changes
  • PikaBackup: about 1 minute, fans up the whole time

Part of Pika’s speed is probably the better exclusion rules; I can use patterns of **/node_modules and **/vendor, to exclude those folders, wherever they are in the tree.  With DejaDup, I would apparently have to add each one individually, and I did not want to bother, nor keep the list up-to-date over time.

Part of DejaDup’s slowness might be that it executes thousands of gpg calls as it works.  Watching with top, DejaDup is frequently running, and sometimes there’s a gpg process running with it.  Often, DejaDup is credited with much less than 100% of a single CPU core.

Features:

PikaBackup offers multiple backup configurations.  I keep my main backup as a weekly backup, on an external drive that’s only plugged in for the occasion. I was able to configure an additional hourly backup of my most-often-changed files in Pika.  (This goes into ~/.borg-fast, which I excluded from the weekly backups.) The hourly backups, covering about 2 GB of files, aren’t noticeable at all when using the system.

Noted under “speed,” PikaBackup offers better control of exclusions.  It tracks how long operations took, so I know that it has been exactly 53–57 seconds to make the incremental weekly backups.

On the other hand, Pika appears to always save the backup password.  DejaDup gives the user the option of whether it should be remembered.

There is a DejaDup plugin for Caja (the MATE file manager) in the OS repo, which may be interesting to MATE users.

Space Usage:

PikaBackup did the weekly backup on 2023-04-24 in 46 seconds; it reports a total backup size of 28 GB and 982 MB (0.959 GB = 3.4%) written out.

With scheduled backups, Pika offers control of the number of copies kept.  One can choose from a couple of presets, or provide custom settings.  Of note, these are count-based rather than time-based; if a laptop is only running for 8-9 hours a day, then 24 hourly backups will be able to provide up to 3 days back in time.

For unscheduled backups, it’s not clear that Pika offers any ‘cleanup’ options, because the cleanup is tied to the schedule in the UI.

I do not remember being given many options to control space usage in DejaDup.

Disaster Simulation:

To ensure backups were really encrypted, I rebooted into the OS Recovery environment and tried to access them.  Both programs’ CLI tools (duplicity and borgbackup) from the OS repository were able to verify the data sets. I don’t know what the stability guarantees are, but it’s nice that this worked in practice.

  • duplicity verified the DejaDup backup in about 9m40s
  • borgbackup verified the PikaBackup backup in 3m23s

This isn’t a benchmark at all; after a while, I got bored of duplicity being credited with 30% of 1 core CPU usage, and started the borgbackup task in parallel.

Both programs required the password to unlock the backup, because my login keychain isn’t available in this environment.

Curiously, borgbackup changed the permissions on a couple of files on the backup during the verification: the config and index files became owned by root.  This made it impossible to access the backups as my normal user, including to take a new one.  I needed to return to my admin user and set the ownership back to my limited account.  The error message made it clear an unexpected exception occurred, but wasn’t very useful beyond that.

Major limitations of this post:

My data set is a few GB, consisting mainly of git repos and related office documents.  The performance of other data sets is likely to vary.

I started running Pika about the same time that DejaDup wanted to make a second backup, so the full-backup date and number of incremental snapshots since should be fairly close to each other.  I expect this to make the verification times comparable.

I haven’t actually done any restores yet.

Final words:

Pika has become my primary backup method.  Together, its speed and its support for multiple configurations made hourly backups reasonable, without compromising the offline weekly backup.

Update History:

This post was updated on 2023-03-31, to add information about multiple backups to “Features,” and about BorgBackup’s file permission change during the verification test.  Links were added to the list above, and a new “Final Words” concluding section was written.

It was updated again on 2023-04-26, to add the “Space Usage” section, and to reduce “I will probably…” statements to reflect the final decisions made.

Thursday, March 16, 2023

Using sshuttle with ufw outbound filtering on Linux (Pop!_OS 22.04)

I am using sshuttle and UFW on my Linux system, and I recently set up outbound traffic filtering (instead of default-allow) in ufw.  Immediately, I noticed I couldn’t make connections via sshuttle anymore.

The solution was to add another rule to ufw:

allow out from anywhere to IP 127.0.0.1, TCP port 12300

Note that this is “all interfaces,” not tied to the loopback interface, lo.

Now… why does this work?  Why doesn’t this traffic already match one of the “accept all on loopback” rules?

To receive that sshuttle is responsible for, sshuttle listens at 127.0.0.1:12300 (by default) and creates some NAT rules to redirect traffic for its subnet to that IP and port.  That is, running sshuttle -r example.com 192.168.99.0/24 creates a NAT rule to catch traffic to any host within 192.168.99.0/24.  This is done in netfilter’s nat tables.

UFW has its rules in the filter tables, and the nat tables run first. Therefore, UFW sees a packet that has already been redirected, and this redirection changes the packet’s destination while its interface and source remain the same!

That’s the key to answering the second question: the “allow traffic on loopback” rules are written to allow traffic on interface lo, and these redirected packets have a different interface (Ethernet or Wi-Fi.) The public interfaces are not expected to have traffic for local addresses on them… but if they do, they don’t get to take a shortcut through the firewall.

With this understanding, we can also see what’s going wrong in the filtering rules.  Without a specific rule to allow port 12300 outbound, the packet reaches the default policy, and if that’s “reject” or “deny,” then the traffic is blocked.  sshuttle never receives it.

Now we can construct the proper match rule: we need to allow traffic to IP 127.0.0.1 on TCP port 12300, and use either “all interfaces” or our public (Ethernet/Wi-Fi) interface.  I left mine at “all interfaces,” in case I should ever plug in the Ethernet.

(I admit to a couple of dead-ends along the way.  One, allowing port 3306 out didn’t help.  Due to the NAT redirection, the firewall never sees a packet with port 3306 itself.  This also means that traffic being forwarded by sshuttle can’t be usefully firewalled on the client side.  The other problem was that I accidentally created the rule to allow UDP instead of TCP the first time.  Haha, oops.)

Thursday, February 2, 2023

On Handling Web Forms [2012]

Editor’s Note: I found this in my drafts from 2012.  The login form works as described, but few of the other forms do.  However, the login form has not had glitches, even with 2FA being added to it recently.  The site as a whole retains its multi-page architecture.  Without further ado, the original post follows…

I’ve been working on some fresher development at work, which is generally an opportunity for a lot of design and reflection on that design.

Back in 2006 or so, I did some testing and tediously developed the standard “302 redirect on POST responses” technique for preventing pages that handled inbound form data from showing up in the browser history.  Thus, sites would be Back- and Reload-friendly, as they’d never show that “About to re-submit a form, do you really want to?” box.  (I would bet it was a well-known technique at the time, but NIH.)

That’s pretty much how I’ve written my sites since, but a necessary consequence of the design is that on submission failure, data for the form must be stored “somewhere,” so it can be retrieved for pre-filling the form after the redirection completes.

My recent app built in this style spends a lot of effort on all that, and then throws in extra complexity: when detecting you’re logged out, it decides to minimize the session storage size, so it cleans up all the keys.  Except for the flash, the saved form data, and the redirection URL.

That latter gets stored because I don’t trust Referer as a rule, and if login fails and redirects to the login page for a retry, it won’t be accurate by the time a later attempt succeeds.  So every page that checks login also stores the form URL to return to after a login.

There’s even an extra layer of keys in the form area, so that each form’s data is stored independently in the session, although I don’t think people actually browse multiple tabs concurrently.  All that is serving to do is bloat up the session when a form gets abandoned somehow.

Even then, it still doesn't work if the form was inside a jQueryUI dialog, because I punted on that one.  The backend and page don’t know the dialog was open, and end up losing the user’s data.

Simplify, Young One

That's a whole lot of complexity just to handle a form submission.  Since the advent of GMail, YouTube, and many other sites which are only useful with JavaScript enabled, and since this latest project is a strictly internal app, I've decided to throw all that away and try again.

Now, a form submits as an AJAX POST and the response comes back.  If there was an error, the client-side code can inform the user, and no state needs saved/restored because the page never reloaded. All the state it had is still there.

But while liberating, that much is old news.  “Everyone” builds single-page apps that way, right?

But here’s the thing: if I build out separate pages for each form, then I’m effectively building private sections of code and state from the point of view of “the app” or “the whole site.”  No page is visible from any other, so each one only has to worry about a couple of things going on in the global navbar.

This means changes roll out more quickly, as well, since users do reload when moving from area to area of the app.