Thursday, June 14, 2018

Python, virtualenv, pipenv

I heard (via LWN) about some discussion about Python and virtualenvs.  I'm bad at compressing thoughts enough to both fit Twitter and make sense at the same time, so I want to cover a bit about my recent experiences here.

I'm writing some in-house software (a new version of memcache-dynamo, targeting Python 3.6+ instead of Perl) and I would like to deploy this as, essentially, a tarball.  I want to build in advance and publish an artifact with a minimum amount of surrounding scripts at deploy time.

The thing is, the Python community seems to have drifted away from being able to run software without a complex installation system that involves running arbitrary Python code.  I can see the value in tools like tox and pipenv—for people who want to distribute code to others.  But that's not what I want to do; I want to distribute pre-built code to myself, and as such, "execute from source" has always been my approach.


Thursday, June 7, 2018

Stream Programming, Without Streams

Editor’s note: Following is a brief draft from two years ago.  I’m cleaning out the backlog and faking HUGE POST COUNTS!! for 2018, I guess.

I wrote before that I don't “use” stream programming, but I've come to realize that it's still how I think of algorithms and for loops.  Input enters at the top, gets transformed in the middle, and yields output at the bottom.

It's like I look at a foreach loop as an in-line lambda function.  The concept may not be explicitly named, and the composition of steps built into lower-level control flow… but inside my head, it's still a “sequence of operations” on a “stream of data.”

There doesn't seem to be much benefit to building up “real” streams in a language that doesn't have them either built-in or in the standard library. It creates another layer where the things that have been built can only interoperate with themselves, and a series of transformations can no longer share state.  And, a PHP array can be passed to any array_* function, where (last I even checked) our handmade streams or Iterators cannot.

Thursday, May 31, 2018

Get and Set

Editor’s note: I found this two-year-old draft hanging around with all my other posts on my PC.  What follows is the original post, since it appears to have aged fairly well, although it could use a conclusion…

It happens that having to call methods of a class in a particular order—unless there’s an obvious, hard dependency like “don’t invoke Email::send() until all the headers and body have been added”—leads to fragility.  There’s no hint from the IDE about the dependencies or ordering.

Worse, if a class has a lot of methods that must be called after setting some crucial data, every one of those methods has to check that the necessary data is set.  It’s better to pass that into the constructor, and have the created instance always contain that data.

Given a class that wants to maintain a valid state for all its methods at any given time, how do setters fit into that class?  It needs to provide not individual setters, but higher-level methods that change “all” relevant state at once.

So, I’m starting to see get and set methods as an anti-pattern.

Thursday, May 24, 2018

Is Elegance Achievable?

Editor’s note: This is a three-year-old draft that I found.  I’m bad at coming back to things. Regardless, what follows is the post in its near-original form; I reworked some “recently” wording because it is certainly not recent anymore.

The best solutions are “clean,” “elegant,” and “simple,” right?  But what does that mean, and are those things even achievable?

Friday, May 18, 2018

Why we have a memcached-dynamo proxy

Previously, I mentioned that we have a proxy that speaks the memcached protocol to clients and stores data in dynamodb, “for reasons.”  Today, it’s time to talk about the architecture, and those reasons.

Wednesday, May 16, 2018

Ubuntu Bionic OVA size optimization (reduction)

For Ubuntu 18.04 LTS “Bionic Beaver,” Canonical has released a new installer on the server image, the one with live in the ISO’s filename.  (The old installer, with full feature support, is available as an alternative download.) The new installer is named Subiquity.

We have a set of scripts that take “an Ubuntu install” and turn it into a VM image pre-installed with PHP, Perl, and dependencies for our code.  This creates a “lite” VM build.  (We used to have a “full” build that included all our git repos and their vendored code, but that grew large.  We’ve moved to mounting the code to be run into the guest.)

Our standard build scripts produced an 850 MB OVA file when starting from the new Subiquity-based installer, compared to around 500 MB on previous releases.  The difference?  Previous releases used the “Install a minimal virtual machine” mode option in the old installer.  In fact, after checking I had used zerofree correctly, using the minimal VM mode in the alternate installer for 18.04 produced a 506 MB OVA.

But maybe that alternate install won’t be around forever, and I should fix the bloat sooner rather than later.  Where do those 344 MB come from, and how do we find them?

Tuesday, May 15, 2018

MySQL data copying optimizations

Recently at work, I optimized a script.  It copies data from a live system to a demo system, anonymizing it along the way.  We used GUIDs as contract and customer identifiers, but in the pursuit of something shorter and more URL-friendly, we created a “V2 format.”

The V2 format secretly hides a client ID inside; the demo importer used to copy the IDs and not worry too much, but now it has to change them all. Making that change took it from running in about four minutes to 15 minutes against a local MySQL instance hosted in the same virtual machine.

Unfortunately, it took 75 minutes in production, and the ID rewriting it was doing was causing our infrastructure to raise alerts about data stability, response times, and IOPS rate limiting while a rebuild was running.

Update 2018-05-18: the production run takes around 9 minutes and 3 seconds after optimization, with far less disruption. I got the local instance rebuild down to 2 minutes and 15 seconds.  I want to share a bit about how I did it.