Decoded Node: 03/2023

Friday, March 31, 2023

Passing data from AWS EventBridge Scheduler to Lambda

The documentation was lacking images, or even descriptions of some screens ("Choose Next. Choose Next.") So, I ran a little experiment to test things out.

When creating a new scheduled event in AWS EventBridge Scheduler, then choosing AWS Lambda: Invoke, a field called "Input" will be available. It's pre-filled with the value of {}, that is, an empty JSON object. This is the value that is passed to the event argument of the Lambda handler:

export async function handler(event, context) {
  // handle the event
}

With an event JSON of {"example":{"one":1,"two":2}}, the handler could read event.example.two to get its value, 2.

It appears that EventBridge Scheduler allows one complete control over this data, and the context argument is only filled with Lambda-related information. Therefore, AWS provides the ability to include the <aws.scheduler.*> values in this JSON data, to be passed to Lambda (or ignored) as one sees fit, rather than imposing any constraints of its own on the data format. (Sorry, no examples; I was only testing the basic features.)

Note that the handler example above is written with ES Modules. This requires the Node 18.x runtime in Lambda, along with a filename of "index.mjs".

Monday, March 27, 2023

DejaDup and PikaBackup, early impressions (Update 2)

I tried a couple of backup programs:

DejaDup (the one I had heard of), a front-end for Duplicity
PikaBackup, a front-end for BorgBackup

I installed both of them as Flatpaks, although deja-dup also has a version in the Pop!_OS 22.04 repository. I have been using DejaDup for four months, and PikaBackup for one month. This has been long enough for DejaDup to make a second full backup, but not so long for Pika to do anything special.

Speed:

For a weekly incremental backup of my data set…

DejaDup: about 5 minutes, lots of fan speed changes
PikaBackup: about 1 minute, fans up the whole time

Part of Pika’s speed is probably the better exclusion rules; I can use patterns of **/node_modules and **/vendor, to exclude those folders, wherever they are in the tree. With DejaDup, I would apparently have to add each one individually, and I did not want to bother, nor keep the list up-to-date over time.

Part of DejaDup’s slowness might be that it executes thousands of gpg calls as it works. Watching with top, DejaDup is frequently running, and sometimes there’s a gpg process running with it. Often, DejaDup is credited with much less than 100% of a single CPU core.

Features:

PikaBackup offers multiple backup configurations. I keep my main backup as a weekly backup, on an external drive that’s only plugged in for the occasion. I was able to configure an additional hourly backup of my most-often-changed files in Pika. (This goes into ~/.borg-fast, which I excluded from the weekly backups.) The hourly backups, covering about 2 GB of files, aren’t noticeable at all when using the system.

Noted under “speed,” PikaBackup offers better control of exclusions. It tracks how long operations took, so I know that it has been exactly 53–57 seconds to make the incremental weekly backups.

On the other hand, Pika appears to always save the backup password. DejaDup gives the user the option of whether it should be remembered.

There is a DejaDup plugin for Caja (the MATE file manager) in the OS repo, which may be interesting to MATE users.

Space Usage:

PikaBackup did the weekly backup on 2023-04-24 in 46 seconds; it reports a total backup size of 28 GB and 982 MB (0.959 GB = 3.4%) written out.

With scheduled backups, Pika offers control of the number of copies kept. One can choose from a couple of presets, or provide custom settings. Of note, these are count-based rather than time-based; if a laptop is only running for 8-9 hours a day, then 24 hourly backups will be able to provide up to 3 days back in time.

For unscheduled backups, it’s not clear that Pika offers any ‘cleanup’ options, because the cleanup is tied to the schedule in the UI.

I do not remember being given many options to control space usage in DejaDup.

Disaster Simulation:

To ensure backups were really encrypted, I rebooted into the OS Recovery environment and tried to access them. Both programs’ CLI tools (duplicity and borgbackup) from the OS repository were able to verify the data sets. I don’t know what the stability guarantees are, but it’s nice that this worked in practice.

duplicity verified the DejaDup backup in about 9m40s
borgbackup verified the PikaBackup backup in 3m23s

This isn’t a benchmark at all; after a while, I got bored of duplicity being credited with 30% of 1 core CPU usage, and started the borgbackup task in parallel.

Both programs required the password to unlock the backup, because my login keychain isn’t available in this environment.

Curiously, borgbackup changed the permissions on a couple of files on the backup during the verification: the config and index files became owned by root. This made it impossible to access the backups as my normal user, including to take a new one. I needed to return to my admin user and set the ownership back to my limited account. The error message made it clear an unexpected exception occurred, but wasn’t very useful beyond that.

Major limitations of this post:

My data set is a few GB, consisting mainly of git repos and related office documents. The performance of other data sets is likely to vary.

I started running Pika about the same time that DejaDup wanted to make a second backup, so the full-backup date and number of incremental snapshots since should be fairly close to each other. I expect this to make the verification times comparable.

I haven’t actually done any restores yet.

Final words:

Pika has become my primary backup method. Together, its speed and its support for multiple configurations made hourly backups reasonable, without compromising the offline weekly backup.

Update History:

This post was updated on 2023-03-31, to add information about multiple backups to “Features,” and about BorgBackup’s file permission change during the verification test. Links were added to the list above, and a new “Final Words” concluding section was written.

It was updated again on 2023-04-26, to add the “Space Usage” section, and to reduce “I will probably…” statements to reflect the final decisions made.

Thursday, March 16, 2023

Using sshuttle with ufw outbound filtering on Linux (Pop!_OS 22.04)

I am using sshuttle and UFW on my Linux system, and I recently set up outbound traffic filtering (instead of default-allow) in ufw. Immediately, I noticed I couldn’t make connections via sshuttle anymore.

The solution was to add another rule to ufw:

allow out from anywhere to IP 127.0.0.1, TCP port 12300

Note that this is “all interfaces,” not tied to the loopback interface, lo.

Now… why does this work? Why doesn’t this traffic already match one of the “accept all on loopback” rules?

To receive that sshuttle is responsible for, sshuttle listens at 127.0.0.1:12300 (by default) and creates some NAT rules to redirect traffic for its subnet to that IP and port. That is, running sshuttle -r example.com 192.168.99.0/24 creates a NAT rule to catch traffic to any host within 192.168.99.0/24. This is done in netfilter’s nat tables.

UFW has its rules in the filter tables, and the nat tables run first. Therefore, UFW sees a packet that has already been redirected, and this redirection changes the packet’s destination while its interface and source remain the same!

That’s the key to answering the second question: the “allow traffic on loopback” rules are written to allow traffic on interface lo, and these redirected packets have a different interface (Ethernet or Wi-Fi.) The public interfaces are not expected to have traffic for local addresses on them… but if they do, they don’t get to take a shortcut through the firewall.

With this understanding, we can also see what’s going wrong in the filtering rules. Without a specific rule to allow port 12300 outbound, the packet reaches the default policy, and if that’s “reject” or “deny,” then the traffic is blocked. sshuttle never receives it.

Now we can construct the proper match rule: we need to allow traffic to IP 127.0.0.1 on TCP port 12300, and use either “all interfaces” or our public (Ethernet/Wi-Fi) interface. I left mine at “all interfaces,” in case I should ever plug in the Ethernet.

(I admit to a couple of dead-ends along the way. One, allowing port 3306 out didn’t help. Due to the NAT redirection, the firewall never sees a packet with port 3306 itself. This also means that traffic being forwarded by sshuttle can’t be usefully firewalled on the client side. The other problem was that I accidentally created the rule to allow UDP instead of TCP the first time. Haha, oops.)