Thursday, May 24, 2018

Is Elegance Achievable?

Editor’s note: This is a three-year-old draft that I found.  I’m bad at coming back to things. Regardless, what follows is the post in its near-original form; I reworked some “recently” wording because it is certainly not recent anymore.

The best solutions are “clean,” “elegant,” and “simple,” right?  But what does that mean, and are those things even achievable?

Friday, May 18, 2018

Why we have a memcached-dynamo proxy

Previously, I mentioned that we have a proxy that speaks the memcached protocol to clients and stores data in dynamodb, “for reasons.”  Today, it’s time to talk about the architecture, and those reasons.

Wednesday, May 16, 2018

Ubuntu Bionic OVA size optimization (reduction)

For Ubuntu 18.04 LTS “Bionic Beaver,” Canonical has released a new installer on the server image, the one with live in the ISO’s filename.  (The old installer, with full feature support, is available as an alternative download.) The new installer is named Subiquity.

We have a set of scripts that take “an Ubuntu install” and turn it into a VM image pre-installed with PHP, Perl, and dependencies for our code.  This creates a “lite” VM build.  (We used to have a “full” build that included all our git repos and their vendored code, but that grew large.  We’ve moved to mounting the code to be run into the guest.)

Our standard build scripts produced an 850 MB OVA file when starting from the new Subiquity-based installer, compared to around 500 MB on previous releases.  The difference?  Previous releases used the “Install a minimal virtual machine” mode option in the old installer.  In fact, after checking I had used zerofree correctly, using the minimal VM mode in the alternate installer for 18.04 produced a 506 MB OVA.

But maybe that alternate install won’t be around forever, and I should fix the bloat sooner rather than later.  Where do those 344 MB come from, and how do we find them?

Tuesday, May 15, 2018

MySQL data copying optimizations

Recently at work, I optimized a script.  It copies data from a live system to a demo system, anonymizing it along the way.  We used GUIDs as contract and customer identifiers, but in the pursuit of something shorter and more URL-friendly, we created a “V2 format.”

The V2 format secretly hides a client ID inside; the demo importer used to copy the IDs and not worry too much, but now it has to change them all. Making that change took it from running in about four minutes to 15 minutes against a local MySQL instance hosted in the same virtual machine.

Unfortunately, it took 75 minutes in production, and the ID rewriting it was doing was causing our infrastructure to raise alerts about data stability, response times, and IOPS rate limiting while a rebuild was running.

Update 2018-05-18: the production run takes around 9 minutes and 3 seconds after optimization, with far less disruption. I got the local instance rebuild down to 2 minutes and 15 seconds.  I want to share a bit about how I did it.

Saturday, April 14, 2018

Effectively Using Future in Perl

I’ve been working a lot with the Future library and IO::Async in Perl recently.  There was a bug in our memcache/dynamo proxy, so I ended up doing a lot of investigation about Futures in order to simulate the bug and verify the fix.  So, I want to talk about how Futures work and why it was so difficult for me.

Saturday, April 7, 2018

PHP 7.2 exposed our bad code

We upgraded to PHP 7.2, and then something horrible happened.

A “thanks for submitting client $x!” message was distributed not only to the agent who did the submission, but to all agents registered in the system.  A lot of drama ensued as confused messages started coming in, and the account rep forwarded the message to basically everyone in the company saying “what happened!?”

We stared at the code, but it was so simple that it couldn’t possibly be wrong, and besides, it hadn’t changed in months.  But the switch over from PHP 7.1 to 7.2 had happened mere hours before this event.

Friday, March 30, 2018

Commit Logs vs. External Systems

There’s a couple of schools of thought on commit logs: should you make them detailed, or should they be little more than a pointer to some sort of external pull request, ticketing, or ALM system?

When people have external systems, the basic problem they face is that those systems seem to be redundant with the information in the commit log.  Why write the same thing twice, when the commit log can just point to the ticket, which undoubtedly has a lot of rationale and reasoning already written into it?

But in my experience, it’s far more useful to have good commit log messages. When hunting in the code for a problem, the commit log is right there and available to tools like git log --grep, which can also print the diff alongside the log messages.

And while some tools like GitHub pull requests offer line-by-line commentary on diffs, the other interesting thing about commit logs is that they’ve proven much more resilient over time.  Our ticket tracking has migrated from my boss’s inbox, to Lighthouse, to a bespoke ticketing system… that we integrated into our support team’s workflow as well, which has become something we want to split back out.  And then we might replace the “remaining half” of the bespoke system with some off-the-shelf solution.

Meanwhile, our commit logs have been preserved, even across a move from subversion to git, so those go back to the point in time where the founders realized they should set it up.  But the references to “Lighthouse” have been broken for years, and if we succeed in killing a huge pile of custom code nobody wants to maintain, all those “ticket #16384” references are also going to be useless.

But the commit messages will live on, in a way that ticketing systems have not.  And so, I don’t really trust that we’d stick to GitHub and have their issue and pull request systems available for every change, forever.

Aside from that, I think a succinct summary of the ticket makes a good commit message.  I try to avoid repeating all the ticket babble, status updates, and dead ends in the commit:

Contoso requested that their cancellations be calculated on a 90-day cliff instead of days remaining in term.  Add cancel_type=cliff and cancel_cliff_days=90 to the client settings.  Ticket 18444.

This gives the big-picture outlook on what happened and why, and lets the diff speak for itself on whether the change suits the intention.  If there are questions about whether the true intention was understood, then the ticket is still linked, so it can be examined in further detail.