Friday, February 5, 2021

What matters? Experimenting with Ubuntu Desktop VM boot time

It seemed slow to me that the "minimal" installation of Ubuntu Desktop 20.10 in a VirtualBox VM takes 37 seconds to boot up, so I ran some experiments.

The host system is a Late 2012 iMac running macOS Catalina.  It has 24 GB of RAM (2x4 GB + 2x8 GB) installed, and the CPU is the Intel Core i5-3470 (Ivy Bridge, 4 cores, 4 threads, 3.2 GHz base and 3.6 GHz max turbo.)  In all cases, the guests are running with VirtualBox Ubuntu Linux 64-bit defaults, which is to say, one ACHI SATA drive that is dynamically sized.  The backing store is on an APFS-formatted Fusion Drive.

The basic "37 seconds" number was timed with a stopwatch, from the point where the Oracle VM screen is replaced with black, to the point where the wallpaper is displayed.  Auto-login is enabled for consistency in this test.  This number should be considered ±1 second, since I took few samples of each system.  I'm looking for huge improvements.

So what are the results?

  • Baseline: 1 core, ext4, linux-generic, elevator=mq-deadline.  37 seconds.
  • XFS: 1 core, xfs root, linux-generic, elevator=mq-deadline.  37 seconds.
  • linux-virtual: 1 core, xfs root, linux-virtual, elevator=mq-deadline. 37 seconds.
  • No-op scheduler: 1 core, xfs root, linux-virtual, elevator=noop. 37 seconds.
  • Quad core: 4 cores, xfs root, linux-virtual, elevator=noop. 27 seconds!
The GUI status icons showed mostly disk access, with some heavy CPU usage for the last part of the boot, but only adding CPU resources made an appreciable difference.  I also tried the linux-kvm kernel, but it doesn't display any graphics or even any text on the console, so that was a complete failure.

(I've tried optimizing boot times in the past with systemd-analyze critical-chain and systemd-analyze blame, but it mostly tells me what random thing was starved for resources that boot, instead of consistently pointing to one thing causing a lot of delay.  Also, it has a habit of saying "boot complete in seven seconds" or whatever as soon as it has decided to start GDM, leaving a lot of time unaccounted for.  So I didn't bother on these tests.)

Correction: it was Ubuntu 20.10, not 20.04 LTS.  The post has been updated accordingly.

Wednesday, February 3, 2021

Stability vs Churn Culture

I’m working on rewriting memcache-dynamo in Go.  Why?  What was wrong with Python?

The problem is that the development community has diverged from my goals.  I’m looking for a language that’s stable, since this isn’t the company’s primary language.  (memcache-dynamo is utility code.) I want to write the code and then forget about it, more or less.  Python has made that impossible.

A 1-year release cycle with “deprecated for 1 cycle before removal” as the policy means it’s possible for users on Ubuntu LTS to end up in a situation where their previous LTS, 2 versions behind, doesn’t provide an alternative for something that’s been removed in the next LTS / current Python.

But looking closer to home, it’s a trend that’s sweeping the industry as a whole, and fracturing communities in its wake.  Perl wants to do the same thing, if “Perl 7” ever lands.

Also, PHP 5.6 has been unsupported for two years, yet there are still code bases out there that support PHP 5, and people are (presumably) still running 5.6 in production.  With Enterprise Linux distributions, we will see this continue for years; RHEL 7 shipped PHP 5.4, with support through June 2024.

There’s a separate group of people that are moving ahead with the yearly updates.  PHPUnit for example; every year, a new major version comes out, dropping support for the now-unsupported PHP versions, and whatever PHPUnit functionality has been arbitrarily renamed in the meantime.  The people writing for 5.x are still using PHPUnit 4 or 5, which don’t support 8.0; it’s not until PHPUnit 8.5.12 that installation on PHP 8 is allowed, and that still doesn’t support PHP 7.0 or 7.1.

This is creating two ecosystems, and it’s putting pressure on the projects that value stability to stop doing that.  People will make a pull request, and write, “but actually, can you drop php 5 support? i had to work extra because of that.”

The instability of Linux libraries, from glibc on up, made building for Linux excessively complex, and AFAICT few people bother.  Go decided to skip glibc by default when building binaries, apparently because that was the easier path?

Now everyone thinks we should repeat that mistake in most of our other languages.

Friday, January 1, 2021

Daisywriter

Growing up, my dad had an old daisy wheel printer hooked up to our computers.  (We had various Commodore hardware; we did not get a PC until 1996, and I'm not sure the Amiga 500 was really decommissioned until the early 2000s.)

The daisy wheel was like a typewriter on a circle.  There were cast metal heads with the letters, numbers, and punctuation on them.  The wheel was spun to position, and then what was essentially an electric hammer punched the letter against the ribbon and the paper.  Then the whole wheel/ribbon/hammer carriage moved to the next position, and the cycle repeated.

This was loud; like a typewriter, amplified by the casing and low-frequency vibrations carried through the furniture.  No printing could be done if anyone in the house had gone to bed, or was napping.

It was also huge. It could have printed on legal paper in landscape mode.

Because of the mechanical similarity to typewriters, the actual print output looked like it was really typewritten.  Teachers would compliment me for my effort on that, and I'd say, "Oh, my dad has a printer that does that."

Nowadays, people send a command language like PostScript, PCL, or even actual PDF to the printer, and it draws on the page.  Everything is graphics; text becomes curves.  But the Daisywriter simply took ASCII in from the parallel port, and printed it out.

Wednesday, December 16, 2020

Containers over systemd

“Systemd will solve all your problems,” they said.

Having used a number of systemd’s security features to configure a service, I am beginning to suspect everyone uses containers because container runtimes are trying to be secure already.

It's possible to improve the security of a service with systemd, of course.  I’ve worked hard at it.  But in the end, over half the *.service file is consumed with “trying to build my own container out of systemd directives.”  ProtectHome, ProtectSystem, ProtectKernelTunables, Protect This, Protect That, blah blah blah.  The process starts from insecure by default, and then asks me to layer on every individual protection.  This is exactly the sort of thing Linux zealots used to yell at Microsoft about.  ¯\_(ツ)_/¯

But I digress.  I ended up with an excessively long systemd service configuration file, and to apply that to any other service, there’s no option besides copying and pasting those directives.  With every release of systemd, I have to comb the man pages again to see what else is available now, and carefully apply that to every service file.  It’s not easy to tell whether the security posture is up-to-date when the policy is so verbose.

Whereas a container has an isolated filesystem (its image) already, so whole classes of configuration (ProtectHome, ProtectSystem, TemporaryFileSystem) become irrelevant.  On top of that, container runtimes start with a more limited set of privileges by default, instead of handing out CAP_SYS_ADMIN and leaving it up to the administrator to carefully disable it.  Escaping from the container runtime is considered a vulnerability; escaping from a poorly-secured systemd service is considered user error.

This is all orthogonal to “containers are interop”, but I think both forces are feeding the containerization craze.  I’m left with the feeling again that systemd should have been the “obvious correct choice,” except they decided usability didn’t matter.

Sunday, December 13, 2020

The world is turning upside-down (2006)

Editor's Note: this is a post from my old Serendipity weblog, "Pawprints of the Mind," which was originally posted nearly 14 years ago, on 2006-12-29. Since then, I can't help but notice that social media created itself on a Pull model—follow who you want—and then replaced it with algorithms to Push into that stream. The text below is reproduced verbatim from the original post.

In the beginning, Push dominated.

A company built a product, and it was Pushed to market. Long ago, newspapers pushed by paper boys on the street carried advertisements. Direct-mail catalogs, pushed through the postal service, were nothing but one long and comprehensive advertisement for the company who created it. The rise of radio and television, both one-way broadcast media, allowed advertisements to be pushed to millions at a time, quickly and easily. Movies and music are produced and pushed, and the producers hope they break even.

After a company pushed their product, they spent plenty of money watching what happened to the sales. Was it going to explode or tank? Was the initial 10,000 piece production run going to be liquidated, or was another run ten times the size waiting in the wings? There was no way to sense demand except to Push supply and watch what happened.

Inevitably, Push was brought to the new media: the Internet. Building on the ideas of direct mail, email lists formed. Spam happened. Spam filters happened. Better spam filters happened. But those may be only a temporary solution.

As Pushing got cheaper, more was pushed, until users fled the deluge. Too much yang leaves people wanting yin. Onto this stage stepped Pull.

Nobody knew it was Pull yet. It called itself RSS or reddit, and it was about users coming and getting it. No more HTML-heavy, graphic-packed email stuffed into their Inbox every other day to make up for the poor quality of the site's search capabilities. No more "Insert-Brand-Here Loyalty Updates". No more subscriptions, passwords, bounce processing, and unsubscribing. No more spam, because the provider no longer needs to know where to send anything; they just wait for users to come and get it.

Pull is about user control. Pull is about saying "I want that" and not having some gatekeeper in the way, trying to extract monopoly rents. This is what scares the recording industry; their value as gatekeepers is plunging as alternative ways of connecting bands and fans arise.

Sunday, December 6, 2020

Can Micropayments Even Work? (2007)

Editor's Note: this is a post from my old Serendipity weblog, "Pawprints of the Mind," which was originally posted over 13 years ago, on 2007-05-13. The text below is reproduced verbatim from the original post.

(This is not entirely academic, as a current goal of my day job essentially amounts to implementing a micropayment system.)

I am beginning to believe that the fundamental problem behind micropayments as a viable option for widespread payment is that credit cards are effectively already micropayments. We're just spoiled by cash. Physical currency is limited by reality. There's not a limitless supply to steal, nor can it be readily created. Undetectable counterfeits cannot be manufactured by poking a few bits inside a computer. Rather, it's difficult to produce high-enough quality counterfeits, which is why good counterfeits only come in $100 notes. The run-of-the-mill counterfeiters are stuck trying to figure out how to make a passable $20 out of card stock or ordinary paper, because anything bigger is subject to too much scrutiny for their materials. (Even then, a local supermarket tests all those, pushing the bar down to $10.)

In essence, I suspect the cost of doing business with a credit card company is mostly the cost of implementing imaginary money securely. The more credit processing costs, the fewer shops join in, and the less of the currency marketshare the creditor ends up with. On the other hand, the services can't be priced so low as to be unprofitable. Not to mention, the more that people use their cards, the more interest the creditor can collect at little extra cost, as all the billing and accounting framework was already in place for that cardholder anyway. Charging less makes economic sense for them, even if they were a monopoly.

A credit card company ends up shaving a few percent off transactions made through them. Micropayments want to be the same thing, only smaller: shave a few percent off penny-sized transactions and make up for it by volume. But the micropayment competes with the credit gateways, if one of the main ways of getting money into the system is to purchase microcurrency on a credit card. Inside the system, the shaving has to be high enough to make up for the transaction cost of the microcurrency being bought and sold, as well as the real costs of doing the transaction and turning a profit.

And if micropayments are essentially equivalent to real currency, then they're also equivalently desirable for fraud, stealing, and counterfeiting: something the large creditors are spending plenty of money on for the best and brightest to counteract. This brings up another point: micropayments probably won't have the same amount of consumer trust as credit cards, because personal liability is legally limited to $50 on the cards. This is not the case for micropayments, which is going to make people not want to have too many of them at one time. That in turn limits the total amount of microcurrency that can be circulated, and restricts the market for higher-priced microsales.

Is it possible to best Visa and MasterCard at their own game?

Sunday, November 29, 2020

Discontinuous Complexity (2007)

Editor's Note: this is a post from my old Serendipity weblog, "Pawprints of the Mind," which was originally posted over 13 years ago, on 2007-10-24. Notably, this was written before I knew about jQuery, which improved on Prototype, eclipsed it entirely, and has since fallen out of fashion. The text below is reproduced verbatim from the original post.

When a system gets sufficiently large, changes become surprisingly difficult. It's as if a gremlin gets into the system, and then perfectly reasonable estimates end up being only halfway to the mark. Or less. This annoys managers, who made a bad decision because the estimate turned out bad, and it annoys engineers because they know their estimate was sensible. Why does this happen?

Slowing Down

I think the key factor is that the system exceeds the size of an individual developer's working memory. In the same way that virtual memory is a lot slower than real memory, development slows down considerably when it exceeds the mind. Tracking work on paper is not instantaneous, and the average developers' choice is to just forget stuff instead. Not that anyone can know when they've forgotten something, or else it wouldn't be forgotten.

The problem with the just-forget method is that it makes coding a lot more time-consuming. You end up writing the same thing, several times, each time accounting for a new layer of information which was forgotten, but later rediscovered. After so much work, you think the code must be beautiful and perfect, until you run it. Another layer of forgotten information becomes apparent, but this time, it has to be painstakingly rediscovered through debugging. There could be several more debug cycles, and if you're unlucky, they can trigger another redesign.

Paper is no panacea either; besides its slowness, it seems to be impossible to track all your thoughts, or sort the relevant from irrelevant. There's nothing like getting halfway through a paper design and then realizing one key detail was missing, and a fresh design cycle must begin. If you're unlucky, there's still a key detail missing.

This overflow is what makes the change so abrupt. There's a sudden, discontinuous jump downward in speed because the system passes a critical point where it's too big to track. Normal development activity has to be rerouted to deal with all the work that has to be done to make "small" changes to the code, and it becomes a huge drain on speed, productivity, and morale. It's no fun to work obviously beyond our capabilities, and the loss of productivity means the speed of accomplishments (and their associated rewards) diminishes as well.

Anticipation

If development must slow when an application reaches a certain size, is there something we can do to stop it from becoming so large in the first place? Could we see this complexity barrier coming, and try to avoid it?

I'm not sure such a thing is possible. Stopping development to go through a shrink phase when the critical point is approaching would require us to be able to see that point before it arrives. The problem is that complexity is easier to manage as it builds up slowly. It's not until some amount of forgetting has happened that we are confronted with the full complexity.

Also, the tendency to break a system down into components or subsystems, and assign separate people to those systems, allows the complexity of the system as a whole to run far ahead of the individual subsystems. By the time we realize the subsystems are out of hand, the whole is practically tangled beyond repair. Besides, your manager probably doesn't want to spend time repairing it even if it wasn't that big.

Familiar Conclusions

No matter what angle I try to approach improving my programming skill from, I keep arriving at the same basic conclusion: that the best systems involve some sort of core, controlled by a separate scripting or extension language. The oldest success of this approach that I know of in common use today is Emacs, which embeds a Lisp dialect for scripting. Having actually made a script for Vim, I have to say that using a ready-made language beats hacking together your own across a handful of major revisions of your program.

I've really begun to see the wisdom in Steve Yegge's viewpoint that HTML is the assembly language of the Web. With SGML languages, document and markup are mixed together, and most of the HTML template you coded up is basically structural support for the actual template data. Even in a template-oriented language like PHP or Smarty™ built on top of HTML, you're forever writing looping code and individual table cells. With a higher-level markup language, you could conceivably just ask for a table, and have it worry about all the exact row/cell markup.

The other major option to reduce the complexity of Web applications, which has apparently been enthusiastically received already, is to put libraries into the languages we have to make them more concise for the actual programming we do. One obvious effort on that front is Prototype, which smooths over (most) browser incompatibilities and adds a number of convenient ways to interact with the Javascript environment. Prototype-based scripts are barely recognizable to the average JS programmer. At what point does a library become an embedded language, if that's even a useful distinction?

In the end, understandable programs come from small, isolated parts. Pushing all variables in the known universe into a template one-by-one is not as simple as providing an interface that lets the template find out whatever it needs to know. Laying out HTML by hand is not as concise as sending a cleaner language through an HTML generator. (And no, XML is not the answer.) Libraries can help, but sometimes, nothing but a language will do.