Tuesday, May 7, 2019

On Tooling

I used to think that tools didn’t matter.  In creative acts, it’s often the person that makes the difference.  Compare violins and 80-90% of the difference in performance comes down to the player.  The $10,000,000 violin does sound a little better in anyone’s hands, but Benny can still get an excellent performance out of Brett’s violin.  Before I was interested in music, Ken Rockwell said the same thing about cameras—the camera doesn’t matter, the photographer does.

In college, we used Visual Studio as if it were “Notepad with a compile button,” so I really didn’t think that much of IDEs.

I went on to program in Vim and GVim for 20 years.  I finally started using plugins; first Syntastic, and then ALE.  I knew I was missing a proper debugging experience, but I didn’t want to give up everything else for that.

But then, things happened.  I used VS Code when homebrew broke MacVim for a while, back in October or so, in which I began to really enjoy Intelephense. It was kind of disorienting to be back in vim, and not have those omnipresent hints.

That was the turning point.  That was the key experience that made me think, “I really should try PHPStorm after all.”

Just as much as VS Code is more productive (code completion and docs!) than vim, PHPStorm is another level beyond my VS Code setup.  It has far more static checks, and it has much more effective refactoring tools.  Oh, and its XDebug integration actually works, unlike everything else I ever tried.

There was some code I ported from Perl to PHP in vim, and didn’t have a good way to test.  I knew it was risky, so I tried extra hard to make sure it was right, then pushed it to production anyway.  By the time someone tried to use the feature, months later, it crashed before even being able to flag the job as “started”.  I opened the file in PHPStorm and fixed around a half-dozen bugs based on its warnings alone.  Then it ran fine.

There’s another project where I have been using the code navigation features heavily (open by class, go to test/implementation/definition) as well as the rename and “change signature” refactorings.  It’s a massive rewrite of an API implementation; we outsourced development for political reasons, which blew up in our face as usual.  But I figured I could clean it up when we took delivery.

Let’s just say, it’s a good thing I have PHPStorm for it, and it’s also clear the external team didn’t. I started out by generating a lot of PHPDoc blocks and locking down the types, just to give PHPStorm some traction on finding the next layer of bugs.

And I know editors are religious, and some would say that I could carefully configure VS Code or vim to do more, to be better at PHP or at Symfony or whatever.  The thing is, PHPStorm did it out of the box. (vim is at a special disadvantage here, because it was designed before IDEs, so it doesn’t have a whole lot of shortcuts available for IDE functionality.)

PHPStorm isn’t perfect, of course.  It’s missing a few warnings, the type analysis doesn’t always work, and it doesn’t seem to handle reworking the namespace if a file is moved around a PSR-4 root.  It’s not very good at understanding a collection of independent CLI scripts—definitions leak across files that don’t include each other.  But all in all, I can’t really imagine taking on the API project in vim or VS Code.  Even with the test suite, it would be slower going, or buggier, or both.

Sunday, February 3, 2019

Deployment May Be Stateful

Our deployment process can technically accept a commit hash or an alternate branch to deploy, but by default, it updates to the currently checked-out branch tip. This default also applies to the auto-update code that brings our pre-baked AMI up-to-date when it launches.

For the most part, this is fine.  We keep master in a deployable state, and that’s always the desired version to deploy.  Thus, the whole system is stateless…

But, it also means that we can’t use our fancy “change branch” or “deploy commit” operations very much.  If we do, then the desired version is no longer what the AMI will auto-deploy when new instances launch from it.  We have to either build a new AMI (for the branch) or restore the deployability of master before any new instances launch.

If we reach the “deploy from tarball” goal, then life would be easier.  Builds could happen from any branch or commit naturally, and we could prevent a broken tarball from auto-deploying by simply deleting it.

Saturday, December 22, 2018

Adventures with kASLR

Things I discovered recently about kASLR:

  • Linux added kaslr a while ago, in 3.14.
  • It was enabled via kaslr on the kernel command line.
  • kaslr wasn’t compatible with hibernation, originally.  This appears to have changed in 4.8.
  • It was enabled by default in 4.12, with nokaslr to disable it.
  • kaslr support is compiled in by CONFIG_RANDOMIZE_BASE=y.
  • Ubuntu 18.04 LTS (bionic) has that setting enabled in the generic and aws kernels, which are based on 4.15.
  • /etc/default/grub may be useless on Ubuntu for setting command line flags.

Under the hood, there are “up to” 512 base addresses for the kernel (depending on the specific machine’s memory map), and kaslr decompresses the image into a random available one during bootup.  This puts the base kernel “somewhere” in a 1 GB-sized area, aligned at 2 MB.

The kernel command line is available in /proc/cmdline.  However, it didn’t have my kaslr customization, which sent me on a quest.  I discovered that Debian/Ubuntu configure a bunch of scripts to produce the final grub configuration file, using /etc/default/grub.d/*.cfg.  These are processed after /etc/default/grub.  There turned out to be a “cfg” file that unconditionally replaced GRUB_CMDLINE_LINUX_DEFAULT, which is where I had put our kaslr flag.  This affected both of our instance types: VirtualBox appeared to have one unintentionally left over from the installer, while AWS had one placed there as part of the cloud image build.

But given that kaslr appears to be default, instead of setting up a local configuration file, I ended up removing the code that was trying in vain to set kaslr.

Friday, November 23, 2018

A Debugging Session

With the company-wide deprecation of Perl, I’ve been rewriting memcache-dynamo in Python 3.  The decision has been made to use a proxy forever.  All of the other options require effort from programmers.  Worse, failures would be inconsequential in dev, but manifest in production.

I’d like to take a moment to walk through the process of debugging this new system, because I built it, I added tests, I felt good about it, and it simply didn’t work with either PHP or Perl clients.

Monday, November 19, 2018

asyncio, new_event_loop, and child watchers

My test suite for memcache-dynamo blocks usage of the global event loop, which was fine, until now. Because aiomcache doesn’t have the “quit” command, and I’m not sure I can duct-tape one in there, I decided to spawn a PHP process (as we’re primarily a PHP shop) to connect-and-quit, exiting 0 on success.

This immediately crashed with an error:

RuntimeError: Cannot add child handler, the child watcher does not have a loop attached

The reason was, the loop didn’t have a child watcher.  Only the subprocess API really cares; everything else just doesn’t run subprocesses, and therefore doesn’t interact with the child watcher, broken or otherwise.

Anyway, the correct way to do things is:

def create_loop():
    asyncio.set_event_loop(None)
    loop = asyncio.new_event_loop()
    asyncio.get_child_watcher().attach_loop(loop)
    return loop

asyncio requires exactly one active/global child watcher, so we don’t jump through any hoops to create a new one.  It wouldn’t meaningfully isolate our tests from the rest of the system.

(Incidentally, the PHP memcached client doesn’t connect to any servers until it must, so the PHP script is really setup + getVersion() + quit(). Without getVersion() to ask for server data, the connection was never made.)

Saturday, November 17, 2018

systemd: the house built on sand

Once upon a time, supervisord got the service management done, but I never got the logs stored anywhere sensible.  Eventually, I got tired of being tapped to debug anything that had an obvious error, but where the message was only logged by supervisord.

Thus began a quest for running things through the distribution’s init system, which has given me some experience with upstart and a lot of experience with systemd.  Like most software that reaches success, systemd has not been carefully designed and implemented.  It has only accumulated, organically.

This is nowhere more obvious than in the configuration system.  I can’t just read documentation online, write a .service file, and expect it to work; I have to use the online search to find which man page they hid the relevant directives in, and spin up a VM to read it.  Once I find the directives that apply, it’s obvious that we have an INI system crying out to be a more imperative, stateful, and/or macro-driven language.

Those are related; because the configuration is underpowered, new capabilities require new tweaks.  Consider the number of “boolean, or special string value” options like ProtectHome and ProtectSystem: these were clearly designed as booleans and then extended later.

Because the website doesn’t keep a changelog—everything changes so fast, systemd just has a major version and every release is breaking—it’s not easy to build a cross-platform service definition file that takes advantage of the features systemd offers.  You know, the things that make it unique from other init systems.  The things that were supposed to be selling points.

It’s because everything changes at the whim of the developers.  Stable? Backwards-compatible, at least?  In a fundamental system component?

Big nope from the systemd team.  There are at least a few directives that were superseded, and so it’s impossible to make a portable service description for a service that is otherwise portable. And the lack of past-proofing tells us about future-proofing.  What you write today may simply not run tomorrow.

systemd was supposed to be the obvious, easy choice: in theory, it embraced Linux and cgroups so that administrators could use cgroups to provide isolation without a separate containerization layer.  But in practice, the separate layer is looking ever more like a better choice.

Saturday, October 13, 2018

Idea: Type Propagation for Gradual Typing

Regarding this paper recently featured on Reddit, I got to thinking.

Perhaps it’s best to add type information starting in high-level modules; intuitively, having a low-level leaf function (especially one that is frequently called) checking and re-checking its type arguments on every call would certainly be slower than a higher-level function that gets called only a few times throughout the course of a run.

For instance, for a program that does data processing, added type checks in “once per file” functions would have less effect on the execution time than type checks in “once per line” functions.

But maybe we’re missing something, here.  The paper adds complete type information to one module at a time, but does nothing about inter-module calls at each step.  That is, a module may define that it accepts a string argument, but callers in other modules won’t be declaring that they are passing strings until that module has types added.