Thursday, November 27, 2014

Versioning

Dan Tao asks, “In all seriousness, what’s the point of the MINOR version in semver? If the goal is dependency stability it should just be MAJOR and PATCH.”

I think I finally remember the answer. It’s set up to flag no compatibility, source compatibility, and binary compatibility. The “thing” that dictates bumping the minor shouldn’t be “features” so much as binary-incompatible changes. For example, GTK 2: if code was written against 2.4.0 using the shiny new file chooser, it would still compile against 2.12. But, once it had been compiled against 2.12, the resulting binary wouldn’t run if 2.4 was in the library path. A binary compiled against 2.4.2 would run on 2.4.0, though, because every 2.4.x release was strictly ABI compatible.

IIRC, they had a policy of forward compatibility for their ABI, so that a binary compiled against 2.4 would run on 2.12, but I don’t remember if that’s actually necessary for SemVer. Another way to look at this is, “If we bump the minor, you’ll need to recompile your own software.” Where that could be programs using a library, or plugins for a program.

I believe that’s the motivation for SemVer to include minor, but upon reflection, it doesn’t really make sense in a non-binary world. If there is no binary with baked-in assumptions about its environment, then that layer isn’t meaningful.

Also upon reflection, most of my currently-tracked projects don’t use SemVer. The AWS SDK for PHP is (in 1.x/2.x) organized as “paradigm.major.minor”, where 2.7 indicates a breaking change vs. 2.6 (DynamoDB got a new data model) but e.g. 2.6.2 added support for loading credentials from ~/.aws/credentials. PHP itself has done things like add charset to the PDO MySQL DSN in 5.3.6. When PHP added the DateTime class, it wasn’t a compatible change, but it didn’t kick the version to 6.0.0. (They were going to add it as Date, but many, many people had classes named that in the wild. They changed to DateTime so there would be less, but not zero, breakage.)

So I’ve actually come to like the AWS 2.x line, where the paradigm represents a major update to fundamental dependencies (like straight cURL and calling new on global classes to Guzzle 3, namespaces, and factory methods) and the major/minor conveys actual, useful levels of information. It makes me a bit disappointed to know they’re switching to SemVer for 3.x, now that I’ve come to understand their existing versioning scheme. If they follow on exactly as before, we’ll have SDK 4 before we know it, and the patch level is probably going to be useless.

I think for systems level code, SemVer is a useful goal to strive for. But the meta point is that a project’s version should always be useful; if minor doesn’t make sense in a language where the engine parses all the source every time it starts, then maybe that level should be dropped.

At the same time, the people that SemVer might be most helpful for don’t really use it. It doesn’t matter that libcool 1.3.18 is binary compatible with libcool 1.3.12 that shipped with the distro, because the average distro (as defined by popular usage) won’t ship the newer libcool; they’ll backport security patches that affect their active platform/configuration. Even if that means they have effectively published 1.3.18, it’ll still be named something like 1.3.12-4ubuntu3.3 in the package manager. Even a high-impact bug fix like “makes pressure sensitivity work again in all KDE/Qt apps” won’t get backported.

Distros don’t roll new updates or releases based on versions, they snapshot the ecosystem as a whole and then smash the bugs out of whatever they got. They don’t seem to use versions to fast-track “minor” updates, nor to schedule in major merges.

One last bit of versioning awkwardness, and then I’m done: versions tend to be kind of fuzzy as it is. Although Net::Amazon::DynamoDB’s git repo has some heavy updates (notably, blob type support) since its last CPAN release, the repo and the CPAN release have the same version number stored in them. When considering development packages in an open-source world, “a version” becomes a whole list of possible builds, all carrying that version number, and each potentially subject to local changes.

Given that, there’s little hope for a One True Versioning scheme, even if everyone would follow it when it was done. I suspect there’s some popularity around SemVer simply because it’s there, and it’s not so obviously broken/inappropriate that developers reject it at first glance. It probably helps that three-part versions are quite common, from the Linux kernel all the way up to packages for interpreted languages (gems or equivalents).

No comments: