Thursday, September 6, 2018

I still don't understand Python packaging

Since we last talked about this subject, I've tried to use pipenv with PIPENV_VENV_IN_PROJECT=1 for the project in question. Everything was going pretty well, and then… updates!

I'm using a Homebrew-installed version of Python to test, because it's easier and faster on localhost, and the available Python version was upgraded from 3.6 to 3.7. As usual, I ran brew cleanup -s so the Python 3.6 installation is gone.

It turns out that my python_version = "3.6" line doesn't do what I want—pipenv will be unable to do anything because that binary no longer exists—and I haven't been able to figure out a way to ask Pipenv to use "3.6 or above" to both:
  1. Express the "minimum version: 3.6" requirement
  2. Allow 3.7 to satisfy the requirement
pipenv seems pretty happy to use the system Python when given a version requirement of ">=3.6" but it's also acting like that's a warning only. pipenv check doesn't like this solution, and it's not clear that a system Python 3.5 would cause it to fail as desired.

In PHP, this is just not that hard. We put "php": "^7.1.3" in our composer.json file, and it will install on PHP >=7.1.3,<8.*. It will fail on <7.1.3 or on 8.x or on an 8.0 development version. It's all understood and supported by the tool.

So anyway: right now, we have a deployment process which is more or less "read the internet; build in place for production; swap symlink to make updated code available to the web server."

The end goal is to move the production deployment process to "extract a tarball; swap symlink." To do this, we need to create the tarball with "read the internet; build in place; roll into tarball" prior. And AFAICT, building a virtualenv into a tarball will package everything successfully, similar to Composer, but it will also bake in all the absolute paths to the build process's Python installation.

Pipfile and Pipfile.lock look like what I want (deterministic dependency selection in the build stage, and with the environment variable, in-project vendoring of those dependencies) but it seems like it's fundamentally built on virtualenv, which seems to be a thing that I don't want. I obviously want dependencies like aiobotocore vendored, but I don't necessarily want "the python binary" and everything at that layer. I especially don't want any symlinks pointing outside the build root to be put into the tarball.

Overall, I think pipenv is trying to solve my problem? But it has dragged in virtualenv to do it, which "vendors too much stuff," and it has never been clear to me what benefit I'm supposed to get from having a bloated virtualenv. And also, virtualenv doesn't fully support relocatable environments, which is another problem to overcome. In the past, it has been fairly benign, but now it has turned adversarial.

(We have the technical capability to make the build server and the production server match, file path for file path, exactly. But my devops-senses tell me that tightly coupling these things is a poor management decision, which seems to imply poor design on the part of virtualenv at least. And that contaminates everything built on top of it.)