Friday, November 23, 2018

A Debugging Session

With the company-wide deprecation of Perl, I’ve been rewriting memcache-dynamo in Python 3.  The decision has been made to use a proxy forever.  All of the other options require effort from programmers.  Worse, failures would be inconsequential in dev, but manifest in production.

I’d like to take a moment to walk through the process of debugging this new system, because I built it, I added tests, I felt good about it, and it simply didn’t work with either PHP or Perl clients.

Monday, November 19, 2018

asyncio, new_event_loop, and child watchers

My test suite for memcache-dynamo blocks usage of the global event loop, which was fine, until now. Because aiomcache doesn’t have the “quit” command, and I’m not sure I can duct-tape one in there, I decided to spawn a PHP process (as we’re primarily a PHP shop) to connect-and-quit, exiting 0 on success.

This immediately crashed with an error:

RuntimeError: Cannot add child handler, the child watcher does not have a loop attached

The reason was, the loop didn’t have a child watcher.  Only the subprocess API really cares; everything else just doesn’t run subprocesses, and therefore doesn’t interact with the child watcher, broken or otherwise.

Anyway, the correct way to do things is:

def create_loop():
    asyncio.set_event_loop(None)
    loop = asyncio.new_event_loop()
    asyncio.get_child_watcher().attach_loop(loop)
    return loop

asyncio requires exactly one active/global child watcher, so we don’t jump through any hoops to create a new one.  It wouldn’t meaningfully isolate our tests from the rest of the system.

(Incidentally, the PHP memcached client doesn’t connect to any servers until it must, so the PHP script is really setup + getVersion() + quit(). Without getVersion() to ask for server data, the connection was never made.)

Saturday, November 17, 2018

systemd: the house built on sand

Once upon a time, supervisord got the service management done, but I never got the logs stored anywhere sensible.  Eventually, I got tired of being tapped to debug anything that had an obvious error, but where the message was only logged by supervisord.

Thus began a quest for running things through the distribution’s init system, which has given me some experience with upstart and a lot of experience with systemd.  Like most software that reaches success, systemd has not been carefully designed and implemented.  It has only accumulated, organically.

This is nowhere more obvious than in the configuration system.  I can’t just read documentation online, write a .service file, and expect it to work; I have to use the online search to find which man page they hid the relevant directives in, and spin up a VM to read it.  Once I find the directives that apply, it’s obvious that we have an INI system crying out to be a more imperative, stateful, and/or macro-driven language.

Those are related; because the configuration is underpowered, new capabilities require new tweaks.  Consider the number of “boolean, or special string value” options like ProtectHome and ProtectSystem: these were clearly designed as booleans and then extended later.

Because the website doesn’t keep a changelog—everything changes so fast, systemd just has a major version and every release is breaking—it’s not easy to build a cross-platform service definition file that takes advantage of the features systemd offers.  You know, the things that make it unique from other init systems.  The things that were supposed to be selling points.

It’s because everything changes at the whim of the developers.  Stable? Backwards-compatible, at least?  In a fundamental system component?

Big nope from the systemd team.  There are at least a few directives that were superseded, and so it’s impossible to make a portable service description for a service that is otherwise portable. And the lack of past-proofing tells us about future-proofing.  What you write today may simply not run tomorrow.

systemd was supposed to be the obvious, easy choice: in theory, it embraced Linux and cgroups so that administrators could use cgroups to provide isolation without a separate containerization layer.  But in practice, the separate layer is looking ever more like a better choice.