Sunday, November 29, 2020

Discontinuous Complexity (2007)

Editor's Note: this is a post from my old Serendipity weblog, "Pawprints of the Mind," which was originally posted over 13 years ago, on 2007-10-24. Notably, this was written before I knew about jQuery, which improved on Prototype, eclipsed it entirely, and has since fallen out of fashion. The text below is reproduced verbatim from the original post.

When a system gets sufficiently large, changes become surprisingly difficult. It's as if a gremlin gets into the system, and then perfectly reasonable estimates end up being only halfway to the mark. Or less. This annoys managers, who made a bad decision because the estimate turned out bad, and it annoys engineers because they know their estimate was sensible. Why does this happen?

Slowing Down

I think the key factor is that the system exceeds the size of an individual developer's working memory. In the same way that virtual memory is a lot slower than real memory, development slows down considerably when it exceeds the mind. Tracking work on paper is not instantaneous, and the average developers' choice is to just forget stuff instead. Not that anyone can know when they've forgotten something, or else it wouldn't be forgotten.

The problem with the just-forget method is that it makes coding a lot more time-consuming. You end up writing the same thing, several times, each time accounting for a new layer of information which was forgotten, but later rediscovered. After so much work, you think the code must be beautiful and perfect, until you run it. Another layer of forgotten information becomes apparent, but this time, it has to be painstakingly rediscovered through debugging. There could be several more debug cycles, and if you're unlucky, they can trigger another redesign.

Paper is no panacea either; besides its slowness, it seems to be impossible to track all your thoughts, or sort the relevant from irrelevant. There's nothing like getting halfway through a paper design and then realizing one key detail was missing, and a fresh design cycle must begin. If you're unlucky, there's still a key detail missing.

This overflow is what makes the change so abrupt. There's a sudden, discontinuous jump downward in speed because the system passes a critical point where it's too big to track. Normal development activity has to be rerouted to deal with all the work that has to be done to make "small" changes to the code, and it becomes a huge drain on speed, productivity, and morale. It's no fun to work obviously beyond our capabilities, and the loss of productivity means the speed of accomplishments (and their associated rewards) diminishes as well.

Anticipation

If development must slow when an application reaches a certain size, is there something we can do to stop it from becoming so large in the first place? Could we see this complexity barrier coming, and try to avoid it?

I'm not sure such a thing is possible. Stopping development to go through a shrink phase when the critical point is approaching would require us to be able to see that point before it arrives. The problem is that complexity is easier to manage as it builds up slowly. It's not until some amount of forgetting has happened that we are confronted with the full complexity.

Also, the tendency to break a system down into components or subsystems, and assign separate people to those systems, allows the complexity of the system as a whole to run far ahead of the individual subsystems. By the time we realize the subsystems are out of hand, the whole is practically tangled beyond repair. Besides, your manager probably doesn't want to spend time repairing it even if it wasn't that big.

Familiar Conclusions

No matter what angle I try to approach improving my programming skill from, I keep arriving at the same basic conclusion: that the best systems involve some sort of core, controlled by a separate scripting or extension language. The oldest success of this approach that I know of in common use today is Emacs, which embeds a Lisp dialect for scripting. Having actually made a script for Vim, I have to say that using a ready-made language beats hacking together your own across a handful of major revisions of your program.

I've really begun to see the wisdom in Steve Yegge's viewpoint that HTML is the assembly language of the Web. With SGML languages, document and markup are mixed together, and most of the HTML template you coded up is basically structural support for the actual template data. Even in a template-oriented language like PHP or Smarty™ built on top of HTML, you're forever writing looping code and individual table cells. With a higher-level markup language, you could conceivably just ask for a table, and have it worry about all the exact row/cell markup.

The other major option to reduce the complexity of Web applications, which has apparently been enthusiastically received already, is to put libraries into the languages we have to make them more concise for the actual programming we do. One obvious effort on that front is Prototype, which smooths over (most) browser incompatibilities and adds a number of convenient ways to interact with the Javascript environment. Prototype-based scripts are barely recognizable to the average JS programmer. At what point does a library become an embedded language, if that's even a useful distinction?

In the end, understandable programs come from small, isolated parts. Pushing all variables in the known universe into a template one-by-one is not as simple as providing an interface that lets the template find out whatever it needs to know. Laying out HTML by hand is not as concise as sending a cleaner language through an HTML generator. (And no, XML is not the answer.) Libraries can help, but sometimes, nothing but a language will do.

Wednesday, November 25, 2020

Where is 'localhost'? (Docker networking)

Building up to the devproxy2 release, I set things up to test in Docker, using a known version of Go.  Nothing worked for quite some time.

Core problem: localhost inside the container is not the host network.  I had to configure it to listen on all addresses (by asking Go to connect to ":8080" without a host address), per this other Stack Overflow answer. If I had thought to check the exact error message client side (curl: empty reply from server), I could have solved this one first, not last.

Once I fixed the core problem, everything started working, and I didn’t go back and test the other things I had tried.  So everything else is only a potential problem.

Potential problem: localhost inside the container is not the host network.  I configured it to connect to host.docker.internal instead, per this Stack Overflow answer.  This is the same as the core problem, but in the other direction.

Potential problem: according to my notes, the default network bridge in Docker doesn’t have Internet access. Creating a user-defined bridge solves this.  Furthermore, using Docker Compose automatically creates a user-defined bridge for the stack.  By the time I solved the core problem, I was already using Docker Compose for the networking feature.

Sunday, November 22, 2020

Obvious (2009)

Editor's Note: this is a post from my old Serendipity weblog, "Pawprints of the Mind," which was originally posted almost 12 years ago, on 2009-04-28. The text below is reproduced verbatim from the original post.

When things like Amazon's 1-click patent come to light, suddenly there's a mob of Slashdotters decrying the idea as 'obvious'. My thoughts on the rsync protocol were the same: knowing that there was a program to transfer changed parts of files over the network, I turned my mind to figuring out how the changed parts could be detected. Later I came across a description of the actual rsync protocol, and it was indeed fairly close to my idea for it. Therefore rsync is obvious.

Or is it? The solution may be relatively obvious, but the problem itself was not something that ever crossed my mind before running into the program that solves it. The invisibility of the problem ends up hiding the solution as well.

Apple seems to be actively searching for non-obvious problems: the ones people don't even think of as problems until the iWhatever comes out and redefines the entire market for whatevers. The iPod's clickwheel seems innovative until you realize it's basically a mousewheel. An analog interface to digital systems. Apple only put that kind of interface on because they happened to see that purely binary controls on existing MP3 players were fairly primitive. Once the iPod was released, nobody wanted to admit to being blind to the problems Apple tackled, so their chosen solution (undoubtedly one of many obvious possibilities) was hailed as pure genius.

It seems that the magic is not in who you are, it's in what you choose to think about. If you've never asked, "How quick can we make it for customers to order?" then you'll never end up with 1-click shopping.

Wednesday, November 18, 2020

Using zerofree to reduce Ubuntu/Debian OVA file size

Editor's note [2020-11-21]: this is an update to a 2014 post on the same topic.  Since then, the mount command is also required.

When exporting an OVA file from VirtualBox's "Export Appliance" feature, any disk block that isn't filled with 0 has to be included in the resulting file.  Normally, an OS doesn't create those blocks as it works; when a file is deleted, the space is marked as free, but retains its original contents until it is re-allocated for another file.  If there are a lot of files deleted, a significant amount of non-empty "empty" space can accumulate.

Running the zerofree program writes data of value 0 over all space marked as free, allowing that free space to be excluded from the OVA file again.

The following applies to Ubuntu 18.04 (bionic) and 20.04 (focal) at least; I forget exactly how far back it goes, but the way to get at a command line changed at some point, and there was a period where poweroff didn't actually work in the pre-boot environment.

Regardless, here's how to run zerofree on Debian, Ubuntu, or derivatives.

Sunday, November 15, 2020

State of the Linux Desktop (2009)

Editor's Note: this is a post from my old Serendipity weblog, "Pawprints of the Mind," which was originally posted nearly 12 years ago, on 2009-05-18.  History since then has unfolded differently in some details, but the broad strokes seem to be accurate.  The text below is reproduced verbatim from the original post.

The Linux desktop is a curious thing. Years ago, the annual prediction was that that year was finally going to be the Year of the Linux Desktop, and Windows would be kicked out of market dominance for good. World domination by open source, and all that.

Linux has come a long way since then. Monitors can be largely auto-configured, fonts can be anti-aliased, CDs can be burned without fiddling with ide-scsi, USB works without having to care about your kernel version, and even media keyboards and mice with tilt-wheels seem to work these days.

But the "desktop" has come a long way since then, too. Various OS X versions introduced technologies like Display PDF, Quartz Extreme, Exposé, and other things that Windows Vista copied with WPF and Aero. Linux eventually duplicated them in Compiz, Beryl, Compiz-Fusion, or something. Somewhere in there. It kinda-sorta works now, if you don't want thumbnails of minimized windows.

Now that Linux has joined the current decade, Microsoft is readying Windows 7, with features like thumbnails of windows on the taskbar (with Aero), the 'recent documents' of an application available from the taskbar button, and so forth. Apple certainly has their own plans. And both of them have decent sound and MIDI support, where starting up one audio application is actually not likely to silence another's sound.

So, the "Linux Desktop" is still behind, and always has been. One problem here seems to be that it's not about innovation. It's about copying everyone else's modern technology with ancient components. It's about solving problems again that already have better solutions in other systems, just to see how closely Linux can approximate them with its decades-old core design.

And then someone, or maybe a few people, whine about the solution and come up with an alternative. The alternatives vie for dominance, and the rest of the desktop world moves on while the Linux landscape settles into one predominant choice, if they're lucky. If they're unlucky, you get things like the KDE/Gnome situation.

Due to Linux's philosophy about neither inventing wheels nor including batteries, there's a large number of components with their own maintenance and release schedules, and most of them target "current" versions of other components. For any given complex program, there's a relatively small window of time between the components it uses being released, and becoming obsolete with future releases. Every time components break compatibility, a new version of all programs dependent on them have to be released to deal with the changes. The more changes dealt with, the more complex the program becomes, so compatibility with older versions also gets cut out after a time.

And sometimes, a sweeping change comes through, and no compatibility is maintained between the two versions: App 2.0 will run on Foo 3.0 and newer, and App 1.4 will only run on Foo 2.5 and below.

From a software developer's point of view, then, supporting Linux is a risky proposition. You have to either watch the work in progress on the entire stack, and guess what will be completed and be important to your application, or you end up being taken by surprise when one of your components updates and your application doesn't run on an ever-growing range of Linux systems until you have time to update it.

Apple and Microsoft may have their own problems, but they have the clout and the guts to say, "We need to migrate to doing things this way, and it'll be good for at least 5 years." Look at Windows 95–Me, or NT4–XP. Whereas the rate of change in Linux effectively limits compatibility to about a year at the most. So most serious software gets developed for other platforms, and most serious users are on those platforms as well. Then familiarity begets desire for those platforms over Linux.

In short, Linux is sort of the Perl of operating systems: messy and aging. People working on Linux desktop technologies seem to be doing it to prove that they can. And secretly, they hope to get horses fast enough to put that Ford guy out of business.

Saturday, October 24, 2020

Defense in Depth: Read-Only Document Root

One security rule I live by as a sysadmin, which admittedly causes me a lot of trouble with PHP apps, is that the running web server user is never able to write to files within the document root.

Any user-file upload/download mechanism goes somewhere else, so that read access can do a permission check, and be delivered through readfile() or a redirect to a CDN, not passed through a complex mechanism that may decide to execute the file's contents.

Configuration files are similarly locked down.  A project that configures itself with PHP code is a project that offers arbitrary code execution to anyone who can change that configuration file.

I don't let the web user install new code to /bin; we shouldn't let the web user install new code to /srv/www. It's the server, not a system administrator.

Not coincidentally, our document root is never the root of a git repository.  It's typically placed in a public folder, allowing us to have a special writable folder like cache both within the project/deployment, and outside of the document root.

Saturday, October 17, 2020

MySQL on MacOS: Discoveries

For a long time, I’ve installed MySQL within my development virtual machine (VM), loaded it with a minimal subset of production data, and pointed the database hostname to 127.0.0.1 through the VM’s /etc/hosts file.

However, working from home, I realized that the my slower DSL connection did not affect the amount of time it took to pull a fresh subset of production data into the VM.

I finally went to the trouble of installing MySQL on the host side, and configuring everything to work with that, so that filling it does not require the data to cross any hypervisor boundaries at all.  And I learned some things!

  1. The installer sets it up to launch as _mysql user
  2. Using the wrong CA certificate causes padding errors
  3. MySQL 8.0.14+ allows a separate "administrative management" port
  4. MySQL’s AES_ENCRYPT() function makes the worst possible choice by default
  5. The speedup is dominated by MySQL's disk access