Wednesday, December 28, 2022

Linux Behavior Without Swap

We had a runaway script clog all of the memory on a micro EC2 Ubuntu instance. Not enough that the kernel OOM killer would do anything, and not enough that the script itself hit the PHP memory limit, but enough to make the instance become unresponsive for 45 minutes.

I have sent Linux into thrashing, back in the old days when typical desktop RAM sizes were less than 1 GB and SSDs weren’t available yet.  What surprised me was just how similar “running out of RAM” was in the modern times, even with the OOM killer.  It let the system bog down instead of killing a process!

We chose to mitigate the issue at work by expanding the instance, so that it has more RAM than memory_limit now.  It will take more than one simultaneous runaway script to bring it down in the future.  (We also fixed the script.  I don’t like throwing resources at problems, in general.)

Then one day, via pure serendipity, I found out about earlyoom.  I have added it to our pet instances, and I’m considering it for the cattle template, but it hasn’t been well-tested due to our previous mitigations.  The instance simply doesn’t run out of RAM anymore.

At home, I first set up swap on zram so that Ubuntu Studio would have a place to “swap out” 2+ GB (out of 12 GB installed), and then recently added a swap partition while I was restructuring things anyway.  It’s not great for realtime audio to swap; but “not having swap” doesn’t appear to change the consequences of memory pressure, so I put some swap in.  With a dedicated swap partition added, I reduced the zram area to 512 MB.  I still want to save the SSD if there’s a small-to-moderate amount of swap usage.

UPDATE: This was imperfect, as it turns out; if you have a swap partition, you should remove zram, and use zswap instead.

No comments: