Thursday, June 14, 2018

Python, virtualenv, pipenv

I heard (via LWN) about some discussion about Python and virtualenvs.  I'm bad at compressing thoughts enough to both fit Twitter and make sense at the same time, so I want to cover a bit about my recent experiences here.

I'm writing some in-house software (a new version of memcache-dynamo, targeting Python 3.6+ instead of Perl) and I would like to deploy this as, essentially, a tarball.  I want to build in advance and publish an artifact with a minimum amount of surrounding scripts at deploy time.

The thing is, the Python community seems to have drifted away from being able to run software without a complex installation system that involves running arbitrary Python code.  I can see the value in tools like tox and pipenv—for people who want to distribute code to others.  But that's not what I want to do; I want to distribute pre-built code to myself, and as such, "execute from source" has always been my approach.

[Update 2018-09-06: I published another post with further thoughts on this problem.]



Ideally, there would be some infrastructure so that github and pypi (or packagist or metacpan) don't have to be online for us to do deployments.  (I can see the value there; it's just something I haven't gotten around to.)  A deployment would hit our own cache for artifacts and there would be no further internet access required.

I know it's kind-of possible in Python, because one installation option for the AWS CLI is a bundle that packages a set of tarballs and points pip to them to install in offline mode… somehow.  It seems like a nice enough approach, but I haven't found a coherent guide to how they make that happen.  Did they write their own install shell script, or is that autogenerated?  Does distutils do it?  Can pip do it?  IDK.

Instead, I found and tried `pipenv`, which makes a directory in the middle of nowhere to hold an ordinary virtualenv.  Or, given an environment variable, it will make it in a dot-dir inside the project itself, but why one would use pipenv at all in that case instead of virtualenv is completely beyond me.  Sure, pipenv has Pipfile and Pipfile.lock, and presumably they're different than `pip freeze >requirements.txt`, but I haven't figured out if it's either relevant or interesting.

Pipenv also updated my production packages when I asked it to add a dev package.  I do not like that.  Did I ask you "pipenv update"?  No?  Then don't do it.  You mixed semantically unrelated updates into one git commit when I didn't notice that happened.

I ended up stripping pipenv out and going the `virtualenv env` route for the moment, but I still don't have a solid deployment story worked out.  (It's okay.  The code isn't finished yet.)  Ultimately, the goal is to place some command in an ExecStart line in a systemd memcache-dynamo.service file, and that one command has to start the whole stack.

As an aside: virtualenv needs to make a choice on relocatable environments, and either delete the option, or make certain it works.  The current "we will try to make it work, but maybe it won't" approach is fundamentally unsound, and makes me worry about the entire project if the developers don't even know if it works.

I liked pipenv better than virtualenv because I want an all-in-one "activate" script of my own devising, that sets up the environment and jumps into git-sh for development afterward.  (Such that one Ctrl+D = one deactivation.)  Pipenv made that rather easier than virtualenv requiring the activate script to be sourced.  For virtualenv, I ended up with an extra rcfile that sources ~/.bashrc, and then env/bin/activate; then I call bash with that extra rcfile, because we can't "push multiple rcfiles" to bash instead.  I haven't yet worked "and run git-sh if available" into this sequence.

Although I liked that aspect better, pipenv didn't appear to solve any deployment issue by default, while layering in more complexity and magic, which is what drove me to using virtualenv instead.

I'm actually kind of confused as to how other communities have made e.g. Carton—and yet Python does not seem to have any awareness of them.  Despite having run into sufficient problems with "run arbitrary Python to install" that they went and invented wheels.  (And yet, if wheels are the answer to my problems—if I can build "a wheel" and install with something like "pip install --wheel=./mcd.whl" that might be enough?  Maybe even through "pipenv run pip install …" after all?  It's all very fragmented.  I'm definitely thrashing around with this.)

Really, I've been writing a lot of PHP and I'm most familiar with composer.  The main difference from Carton is that the latter requires `carton exec` (like `pipenv run`), while Composer is much more oriented toward making library code available through an autoloader.  Composer is, AFAIK, relocatable by default since PHP has __DIR__.  Carton installs dependencies locally (similar to Composer) and has an optional local cache for the sources as well, and this is all clearly explained in its main documentation.

In other words, Composer is similar to virtualenv, except it asks user code to explicitly connect the autoloader, instead of relying on magic environment being passed to the interpreter.  Carton is closest to pipenv, except it also has a trivial-to-use "cache all sources, too" command.

I don't actually use Carton with Perl in practice; because there's only two Perl apps and I maintain them in sync as the local Perl Guru, their dependencies don't conflict, and I actually use cpanminus with a cpanfile to install all the dependencies globally.  Faced with an irreconcilable conflict, or with separate teams working on multiple Perl codebases, I would certainly use Carton, or else tolerate containerization (which would provide the necessary isolation, at the cost of more operational complexity and bloat.)

The Python knowledge is such a mess that I haven't found a good, clear approach that fits me.  I always head to docs.python.org first, but it doesn't want to acknowledge anything outside of stdlib (such as, maybe even pip itself) and pipenv/virtualenv seem to be focused on development, without saying much about deployment.  Deployment still seems to be assumed to be "running a distutils-based installer."

Tangentially, I haven't found a Python equivalent of Server::Starter.  Python deployment practice appears to be of the "roll out a new base image with updated code, then switch the load balancer to it" variety.  Thanks, but that's probably a multi-minute process, blocked on waiting for an EBS snapshot, compared to a handful of seconds for our current system.

I say all this not to disparage Python.  I say it because I want to get my point of view, my assumptions, my history, and my goals across.  If someone reads this and shouts, "WHY ARE YOU DOING THAT?" then it may point to some sort of missing link between docs.python.org or web search, and the right way.  Or it may point to a misunderstanding between us, or between me and pipenv.  Certainly, as I've been writing this post, pipenv has begun to look like it could be better than plain virtualenv, but I still want to communicate the feeling of stumbling around, bumping into pip, pipenv, and virtualenv—all separately—and trying to make sense of the result.

No comments: