Monday, December 14, 2015

Re: Pomodoro technique

I hear, once in a while, this “great life hack” called the Pomodoro Technique. The basic principle is, one works in small, focused bursts on a task.

There’s a major problem, though.  It doesn’t work for me.  It might take me the entire “focused burst” time to really get focused on the task at hand, and then, boom! A forced distraction takes me away, just as I get warmed up. Getting back into the next cycle then takes at least as long, or longer, and pretty soon there’s zero forward progress as my attention is sliced to ribbons by a timer.

(I observe the same problem with the average ‘typing break’ software that doesn’t bother monitoring actual typing.  I am too frequently just getting warmed up when a “scheduled break” arrives.)

tl;dr: the “pomodoro technique” is not a magic cure-all, and you don’t have to feel bad if it doesn’t work for you.  You can just stop using it.

Wednesday, November 11, 2015

Warts: one more Go rant

In response to this reddit thread

I just want to agree with one of the commenters, that go get ignoring versioning is also a massive pain with the language.  The official justification is that the Go developers did not know what the community would want, so they “didn’t build anything.”

But they did, and it’s called go get… and they were right, it is not what anyone wants.

Unfortunately, since it’s also the only thing included by default—as well as being a great party trick, “just include this package and go get knows how to find it”—people will try to use it, and it will burn them.

We can express that a repository has code compatible with Go 1.0 by giving it a go1 branch, but there’s nothing to indicate any higher minor versions. Libraries wanting to be friendly to go get must be written to Go 1.0 for ever and ever, no matter how much the language improves afterward.

After this was painfully apparent, the official story switched to “vendor all your dependencies.”  As in, use some extra tool to get a copy in your project and completely ignore $GOPATH.  Worse, it wasn’t just “some” extra tool, but there was a page with number of vendoring tools that worked in different ways with different feature sets.  You were expected to “pick one” that suited you.

Then they decided that was all wrong, and that vendoring would be understood by the language itself, someday.  Which will be great, once that version makes it everywhere, and the feature can be relied upon.  We might be waiting a long time for it to land in RHEL and Debian stable.

I’m happy they’re fixing it, eventually.  I really am.  This is a major complaint, and they’re getting rid of it.

Now if only they could do the same for generics.  It’s really frustrating having a first-class function that has no-class data.

Monday, September 28, 2015

The Value of Open

(I found this ancient draft, last edited in January 2012. It looks pretty solid for its time, and potentially still relevant, so with that in mind, here it is...)

I met the internet over a decade ago, as an idealistic teen.  Linux was a rising star, so when I got a computer for college, I found a builder who was willing to sell systems with Windows 98 and Red Hat Linux side-by-side.  In those bygone days, the 2.2 kernel was near the end of its life, but USB support was backported while we waited for 2.4.  (I'm sure other things were, as well, but that is the one that had the most impact on my USB scanner.)  I connected my DIN hand-me-down-and-down-and-down keyboard from the old 486 I had been using to a PS/2 converter and joined the Future with my new machine.

I believed in everything about open source back then.  We even had proof: Linux! and Red Hat! We were totally not chasing taillights!  What could possibly be wrong with our utopia?

Friday, September 25, 2015

Apache Config Merging

(I found this post I drafted a year ago. I don't know why it isn't posted, so here it is...)

The Apache documentation tells you how to order directives in the same scopes, but neglects to remind you about the way different scopes merge.  So, as of 2.4, here's an overview of how it all works.

When the server first receives a URL, it searches for matching containers to determine the active configuration.  The following containers are checked at this stage, highest priority first ("last to first" or "first to last" refers to the order that the sections or directives are listed in the configuration file):
  1. <If>, from last to first.
  2. <Location> and <LocationMatch>, last to first.

Once the server has a physical path, directives are applied, highest priority first:
  1. <Files> and <FilesMatch>, last to first.
  2. <DirectoryMatch> sections matching the directory, last to first.
  3. From most- to least-specific directory path (e.g. /foo/bar before /), regardless of order:
    1. Directives in .htaccess files, if allowed and present.
    2. <Directory> section.  I suspect, but haven't verified, that multiple Directory sections on the same path (e.g. <Directory /foo></Directory> ... <Directory /foo></Directory>) will apply last to first.
Directives are also merged with VirtualHost and server-wide (outside of any other section) contexts, with priority again being given to VirtualHost over the server-wide directives.  That is, a ProxyPass in the server-wide section will apply to all virtual hosts by default, but a ProxyPass within a <VirtualHost> section will be able to replace the server-wide rule.

The ordering of directives within sections, and what happens with duplicates (at multiple priority levels) in general, is defined by each individual module.
  1. RewriteRules execute first-to-last when the section containing them is processed.  Stopping is subject to flags and things (mod_rewrite is a powerful beast): the [L] flag and any that imply it end processing of rewrite rules in that section; for example, rules in <VirtualHost> cannot stop rules in <Directory> from applying.
  2. ProxyPass and ProxyPassMatch rules execute first-to-last, stopping when any match is found.  Thus the "longest match first" rule given in mod_proxy's documentation.
  3. Alias, AliasMatch, Redirect, and RedirectMatch rules execute first-to-last, stopping when any match is found.  Likewise, this produces a "longest match first" rule that is given in the mod_alias documentation.
  4. Whether a URL is tried with other modules (like mod_proxy or mod_alias) after RewriteRules have taken effect depends on how the RewriteRule is written and where it is placed.  I am not sure I understand the finer points of this, but the pass-through flag [PT] exists to force mod_rewrite to treat it as a URL and let other modules have a chance to handle it.
The main takeaway here is to remember what level you're working at, all the time.  If you place two ProxyPassMatch directives each in a separate <LocationMatch> block, then their order of application is defined by the rules for <LocationMatch>.  It is only when ProxyPass/Match directives are sharing the same section (say, both within <VirtualHost>) that they use the longest-match-first rule from mod_proxy.

One other quirk of laying out mod_proxy directives is that they're not actually valid within <Directory> or <Files> sections.  mod_proxy itself deals only with the URL space, and if another module has decided on a file-system path, then Apache is implicitly serving the content directly, as origin server.  It is then too late for pure URL-to-URL manipulations, which is the level that mod_proxy works at.

Saturday, September 5, 2015

Templates and DOMs

In my last post, I mentioned my ideal of keeping “HTML” generation as operations on a DOM tree, instead of assigning variables to templates and using string substitution. Parse the initial template file, fill it with data, then render it once with safe encoding (where relevant) at the end.

I also know why this approach isn’t as popular: everyone hates the DOM.

Friday, September 4, 2015

Velocity: NIH vs. Frameworks

I really hate frameworks. I also hate framework benchmarks that are done as, “let’s see how fast we can template static data into a tiny file! Who cares about branding, databases, proper output escaping, realistic access patterns, or accurate entity-body sizes?”

I hate frameworks mostly because I always feel like I can write a faster script in pure PHP.

It doesn’t really help that I have certain ways I’d like to do things, and most frameworks actually don’t do it that way. Case in point: “HTML templates” should be built by DOM manipulation, just like SQL queries should be prepared (or better.) Pasting strings into strings for another interpreter is the deepest of follies… and the most popular/frequent approach.

Also not helping: the fact that I learn about whole new classes of vulnerability when someone writes up a report showing that, by carefully crafting a string that goes through unserialize($_COOKIE['foo']) or even $cls = $_GET['bar']; new $cls(), they can get arbitrary PHP execution on framework X. No need to install their own webshell!

Unfortunately, I’ve also gotten tired of writing raw HTTP handling and dispatch code. (This has gotten especially tiresome as features like “get the correct client IP from X-Forwarded-For instead of using REMOTE_ADDR blindly” have become necessary, since ELB became part of production.)

The other downside is that writing my own ‘nano-framework’ means that everyone else on the team gets stuck learning my extremely non-portable approach when they want to work on a site I built. Or they can just blatantly ignore the Architecture, because after all, writing code is more fun than reading it. (I’d be more angry about this, but… guilty as charged. See also: frameworks.)

Two really interesting things have happened, though.
  1. TechEmpower has been doing some amazingly awesome, in-depth, serious framework benchmarks for years now, and posting the results.
  2. With the rise of micro-frameworks, some handy reusable libraries like the Symfony HttpFoundation have been published.

It’s clear that I can be more productive by leveraging code that’s already written. (And debugged.) It’s also clear that I don’t want to carry this to the extreme and start using Symfony2—just check out the “framework overhead” tab, or maybe the “Errors” column, at TechEmpower. I don’t know where the happy medium is yet, but writing my own thing is not it.

tl;dr: the moral of this post is, don’t keep rewriting your own code when you can find someone else’s that works. Keep looking for light weight though, because it’s usually a good proxy for other qualities like clarity, speed, size, and API stability. OTOH, be able to recognize when something isn’t serving your needs. I guess it’s all a hard balance to strike, but NIH will slowly crush you. And you won’t notice until you’re dead.

Thursday, September 3, 2015

SNS Deployment update

As an update to this old post from 2013…

We have moved to using php-fpm, so naturally suEXEC has been replaced by php-fpm’s configuration. That allows for running multiple pools, each running PHP scripts under their own user.

We have the “main” pool still using the same unprivileged user as Apache, and then there’s a “privileged” pool that uses the privileged user. Only the traffic on the port receiving SNS notifications is directed to this privileged pool. The main pool still has the same permission it would if it were running under mod_php, to ease the transition.

The transition was relatively painless, but only because I’d already converted per-dir .htaccess files into rules in the server config. It was part micro-optimization, part “I will probably want nginx someday.” Although Apache is still serving our needs admirably.

Wednesday, September 2, 2015

Clean up when closing a terminal

I've taken to clearing my ssh-agent identities and sudo timestamps when I close my shell, by putting in my ~/.bash_logout file:

if [[ -n $SSH_AUTH_SOCK && -z $SSH_CONNECTION && $SHLVL = 1 ]] ; then
 ssh-add -D
 sudo -K
fi

One caveat: the above only works using the shell's exit (or logout or Ctrl + D), not with iTerm2's close button.  However, that can be fixed by using an exit trap in .bash_profile instead, like so:

clearKeys() {
 ssh-add -D
 sudo -K
}
if [[ -n $SSH_AUTH_SOCK && -z $SSH_CONNECTION && $SHLVL = 1 ]] ; then
 trap clearKeys EXIT
fi

To be clear, the latter version requires no changes/additions to .bash_logout.

Rationale: I usually work on a desktop, and keep a copy of my work in sync on a laptop using unison.  Making the SSH connection from the laptop adds the key to the session's ssh-agent, but I don't want that to persist after sync is finished.  I don't want keys to stay active while I'm not planning on using them soon.

Monday, August 17, 2015

Rebuilding Dependencies Frequently

Reddit asked:
how often do you rebuild your OS level dependencies?

In practice, around twice a month, due to accumulation of security updates. Sometimes bit-rot plays a role as well, but mostly, it’s the endless stream of updates. We want images to quickly launch into a secure state.

Waiting until the instance comes up to apply updates has two drawbacks: one, each new instance duplicates the work. Two, if there’s an update that requires a reboot, we’ve found through repeated, painful experience that there’s no way to reliably reboot our instances programatically.

IIRC though, that question was in the context of Docker containers, while our process is to precompile our app and its dependencies into a monolithic AMI that we deploy to instances in AWS. I like to think it’s basically the same, but I haven’t really used Docker much.

Server Cleanup

Some stuff on sapphirepaw.org has been 404’d into the dustbin of history. Pretty much everything from before 2010 is gone now, much of it ‘unlisted.’ Stuff like bug reports on pre-Blink Opera and PHP 5.2, pictures from events I attended and talks I gave, and my wallpaper section. The latter was built as a demo/learning PHP app, and it turned out to be really timely in 2005. It was something I could point at and say, “I built that in PHP/MySQL,” right when I was looking for my first tech job. But it’s not something I’ve been at all interested in updating since—either code or data—so I removed it.

Along with those changes, the remaining pages are static now. I used to have a tiny little framework that wrapped page content with header, navigation, and footer, configured by a little PHP block in the source, then served the page. But there’s no real need to render that on every request, so I turned the framework into a static site generator. All the links get rewritten, too. The PHP URLs still work, but the server simply issues redirections to the static pages.

There was one hitch to the main process: libxml2 (PHP's DOMDocument->loadHTML()) doesn't recognize HTML5, only HTML 4.0. I had to figure out how to use the Masterminds HTML5 parser instead.

There was one page I wanted to keep (for now) that was dealing with more dynamic content, and that was termitheme’s themes.php page. That had to be converted to SQLite for the site builder to use local data instead of a MySQL connection.

I finally deleted all the old themes and graphics from /css, too, because my sense of design has matured along with the capabilities of CSS over the years. Would I ever resurrect the deep blue theme of years past? It would look dated. I'd have to design a new, modern theme (merely inspired by the old one) anyway.

Friday, August 7, 2015

Linode KVM: not recommended

Looks like KVM has been a de facto downgrade. I never had constant small dropouts on Xen, and I had certainly never had my server go silent for hours. (Linode 2048, btw. It started as a 512.)


It might be time to go looking for a new VPS provider...?

Tuesday, July 14, 2015

TIL: HTTP Upgrade

I’m thinking about creating devproxy in a different language. Tracking down some relevant specs, I found the CONNECT RFC.

This RFC includes not only the definition of CONNECT, but alternatively, the use of an Upgrade header to convert a regular HTTP connection to HTTPS, either with optional or mandatory encryption. It was like STARTTLS for HTTP, in a way.

CONNECT won out in the real world, of course, but I find this lost feature kind of fascinating.

Quick comparison:

  • Upgrade is a hop-by-hop header. The browser/proxy and proxy/upstream connection MAY be using different levels of encryption.
  • When using Upgrade, the proxy needs a valid TLS certificate to handle encrypting traffic with its clients.
  • Also, this means the proxy can still view/cache/log the data that was encrypted on the wire.

CONNECT is basically the opposite: once the request is made and the proxy allows it, the proxy reverts to being just as dumb as any router on the Internet. All it can do is shuttle the bytes, so the same bytes that leave the origin end up at the client without any caching or interpretation. Since CONNECT is mainly used for HTTPS, those bytes are most often encrypted, as well.

Google may have tried the Upgrade header when first developing SPDY, but they didn’t like the extra round-trip nor the ability for intermediate devices on the network to interfere (intentionally or otherwise.) So it didn’t end up getting resurrected from the dustbin of history for that, either.

So maybe I didn’t learn about it today, but only rediscovered it.

Friday, July 10, 2015

Letting Go of Go

The things that originally attracted me to Go were the concurrency model, the interface system, and the speed. I was kind of meh about static typing (and definitely meh about the interface{} escape hatch) but figured the benefits might be worth the price?

But it hasn’t really turned out that way. I still like the concept of having no locks exposed to the user (safely hidden in the channel internals) à la Erlang or Clojure. But I’m not going to pay for it with err everywhere, static types, a profusion of channels, and a lack of generics.

Seriously, all of the synchronization choices of Go seem to come down to, “Use another channel.” Keeping track of so many channels among a few stages of processing is a whole new layer of heavy work. That would be pretty much unnecessary if channels could be “closed with error,” which could then be collected by the UI end.

Then there is the whole problem of generics. The runtime clearly has them: basically, anything creatable through make() is generic. But there’s no way for Go code to define new types that make can create generically. There’s no way for Go code to accept a type name and act on it as a type, either.

You can pretend to hack around it with interface{} and runtime type assertions, but you lose all of the static checking. The compiler itself knows that a map[string]int uses strings as keys and can only store integers, but an interface{} based pseudomap won’t fail until runtime.

To get the purported advantages of static typing, the data has to be fit to the types that are already there.

I’d almost say it doesn’t matter to my code, but it seems to be a big deal for libraries. How they choose their data layout has effects on callers and integration with other libraries. I don’t want to write a tall stack of dull code to transform between one and the other.

The static types thing, I’m kind of ambivalent about. If the compiler can use the types to optimize the generated code, so much the better. But it radically slows down prototyping by forcing decisions earlier. On the balance, it doesn’t seem like a win or a loss.

Especially with all the performance optimization work centering on dynamic languages, refined in Java (to a certain extent), C#, and JRuby, now flowing into JavaScript. It’s getting crazy out there. I don’t know if static typing is going to hold onto its edge.

I think that brings us back around to err. Everywhere. I really want lisp’s condition system instead. It seems like a waste to define a new runtime, with new managed stacks, that doesn’t have restarts and handlers. With the approach they’ve chosen, half of go code is solving the problem, and the other half is checking and rethrowing err codes.

Go isn’t supposed to have exceptions, but if you can deal with the limitations of it, recover is a thing. (But it’s still not Lisp’s condition system, and by convention, panic/recover isn’t supposed to leak across module boundaries.)

I forgot about the mess that is vendoring and go get ruining everything, but I guess they’re working on fixing that. It’s a transient pain that’ll be gone in a couple more years, too late for my weary soul.

But am I wrong? What about the “go is the language of the cloud” thing that Docker, Packer, and friends have started? I don’t think Go is “natively cloud” because that’s meaningless. I think a few devs just happened to independently pick Go when they wanted to experiment, and their experiments became popular.

It surely helps that Go makes it easy to cross-compile machine-code binaries that will run anywhere without stupid glibc versioning issues, but you know what else is highly portable amongst systems? Anything that doesn’t compile to machine code. For instance, the AWS CLI is written in Python… while their Go SDK is still in alpha.

tl;dr

I find the limitations more troublesome than the good parts, on the balance. I recently realized I do not care about Go anymore, and haven’t written any serious code in it since 1.1 at the latest. It’s not interesting on all sides in the way Clojure is.

Wednesday, June 24, 2015

Cheating via Lookup Table

One of the assignments in college was to write a binary-to-decimal converter in MIPS assembly, of all things. The goal was, given a 32-bit unsigned number like 0xdecafbad, print out “three seven three seven eight four four six five three”. Or “six four” for 0x40.

So the professor got 150 submissions like

print_digit:
  cmp #0
  beq print_zero
... nine more digits ...
print_zero:
  (load address of "zero" string)
  (call library function to actually print)
  rts
... nine more digits ...

I almost coded mine the same way, but I asked myself, “I don't want to write the same code for ten cases. How would a real hacker solve this?” So I made a lookup table of pointers to each string, and my function was more like:

print_digit:
  (load value from "string_addrs + digit")
  (call library function to actually print)
  rts

Of course, I’m just showing snippets of the core function; everyone had to write their own complete program.

The prof had apparently never seen my approach, because he asked if I had even run it to see if it worked. In shock, I said something like, “Yeah, why? Didn’t it work for you?” and he replied, “Um well yes. I just thought you maybe cheated.”

Haha, no. I’m just awesome.

Thursday, May 28, 2015

Apache, FastCGI, and the Authorization header

I couldn’t find much about why Apache (up to and including 2.4.x) doesn’t pass the HTTP Authorization header to FastCGI by searching the Internet, so I fished through their source code.

There’s a function that sets up the default variables to pass to CGI programs, called ap_add_common_vars. This function expects that other users on the server can see these variables with ps -e, so to hide usernames and passwords from appearing in there, it specifically avoids passing Authorization through to CGI.

FastCGI uses the same ap_add_common_vars function, and mod_proxy_fcgi doesn’t take any special care to put the Authorization back in for requests that are sent over a socket. Thus, by default, no FastCGI requests will receive usernames and passwords.

The standard workaround is to use mod_rewrite to set an environment flag if the request has authorization. Because nothing unsets the Authorization—ap_add_common_vars simply skips setting a default—that explicitly-set environment variable will be included in the FastCGI request.

RewriteCond %{HTTP:Authorization} .
RewriteRule / - [E:HTTP_AUTHORIZATION=%{HTTP:Authorization}]

That’s fine if you’re already using mod_rewrite. Can it be done without that module? I haven’t benchmarked the difference, but mod_setenvif can do it via the rule:

SetEnvIf Authorization (.+) HTTP_AUTHORIZATION=$1

tl;dr: Authorization isn’t passed to FastCGI because of the default protection from exposing passwords with slow CGI. It can safely be passed to a FastCGI server (requests are sent over a socket, not through world-readable process attributes) but it must be explicitly passed.

Updated 28 Jul 2015: I finally tested the SetEnvIf rule. I also mentioned what Apache line I'm working with.

Monday, April 20, 2015

Hacking my Habits: screen vim → gvim

I spend much of my time using the screen command inside a VM, opening vim in a new window by running screen vim.  When I'm on bare metal, then, sometimes I accidentally start a whole screen for a quick edit, and I don't realize it until I exit vim and it says, [screen is terminating].

So I built a little function into my ~/.bash_aliases (sourced from the default .bashrc) on the host machine:
# if I run `screen vim` outside of screen, invoke gvim instead.
screen() {
 if [[ $1 = vim && -z $STY ]] ; then
  shift
  gvim ${1+"$@"}
 else
  command screen ${1+"$@"}
 fi
}

Now "screen <anything but vim>" will run screen, and "screen vim" will invoke gvim.  The latter may surprise me, but it'll be what I actually want.

Update: I put similar code inside the VM to convert "gvim" into "screen vim" or regular "vim", as appropriate. Now I abuse this all the time to save three keystrokes ("gvim" vs "scr<TAB>vim").

Monday, March 23, 2015

Timezone Fail

Okay.  Who messed up timezone handling so badly that it's “not safe” to rely on the system?  If it’s actually unsafe, how is date_default_timezone_set() or an equivalent ini_set any safer?  You could just as easily set an Evil Attack™ Timezone with them.

(And who formatted this message so poorly?)

Tuesday, February 24, 2015

Broadband Asymmetry

Someone, somewhere on the intertubes, recently asked: “Why are we seeing 10:1 speed asymmetry?”

I’m pretty sure the speed asymmetry started for good technical reasons: through some signal magic, companies could deliver 48+ Kbps down and 33 Kbps up. ADSL took advantage of the technological landscape of the time (browsing and email were asymmetric) to deliver faster speeds where the customers cared, and indeed, my first DSL services were (if memory serves) in the 1.5/0.38 Mbps area, only a 4:1 asymmetry. My current service is ADSL at 9.3:1 (7.0/0.75), which is both notably closer to 10:1, and hasn’t qualified as broadband since the 4/1 Mbps definition went into effect.

Even though YouTube made video hit the web in a big way—they were there at the crossover point between better codecs and better bandwidth, plus some cleverness* on their part—most traffic was still downstream. The video being delivered was much larger than the return traffic that acknowledged receipt of the video.

The restricted upload rates are thus firmly grounded in historical reality, and they persist today because, I suspect, of two reasons.

One, there’s obviously a chicken-and-egg problem where uploads are less frequent because upload rates are low, and the rates are lower because uploads are less frequent. There’s a natural tendency for uploads to be less frequent anyway (how many funny cat pictures do you look at per picture you upload?) but low upload rates discourage actual upload usage in and of themselves.

Two, I think ISPs are rewarded if they keep upload rates low. Settlement-free peering has traditionally required each side to send “about equal” traffic as the other. If an ISP strongly encourages downloads through 10:1 or more asymmetry, then they will never come close to sending “about equal” traffic out of their network... and they can demand payment from anyone who wants access to their customers, such as Netflix.

I still believe that ISPs should be charging their own customers enough to support their own customers’ data requests including adequate network investment, but that doesn’t invalidate the reality.

As for 10:1 specifically, I can only speculate. It may be, that’s simply the size where sending email and uploading to Facebook “doesn’t seem to take too long” for users. And if more people sent more video to Facebook, then ISPs may reshuffle their plans to provide more upstream “so you can Facebook.” Regardless, in the absence of an obvious technical reason, I must assume it serves a specific marketing purpose.

* At one time, they showed 320-pixel video in a 425-pixel player. Although scaling technically hurts quality, it crossed the gap between “small” and “nicely sized,” looking much better on 1000-pixel browsers.

Monday, February 23, 2015

Perl is Dying

I like Perl. It's been a part of my life for over 14 years, since I had 5.6.0 on Red Hat 7.0. (Funny how Red Hat Enterprise is now 7.0, and the desktop project hasn't borne the name for over twenty releases.)

But, I'm getting the distinct impression that the language is losing its community. It seems to be getting too small to keep up with the nice things.

Case in point: AWS. Trawl metacpan.org, and there's no coherent set of packages for using Amazon services. There are some Net::Amazon modules, some Amazon modules, some AWS modules, and SimpleDB::Client doesn't even have Amazon in the name. UPDATE (2020-08-08): This update has been a long time coming, but I discovered and switched to Paws sometime in 2018. On the other hand, Paws::S3 (S3!!!) issues a warning on load that it's not finished. Yea, unto this very day.

Then there are the duplicate packages like Net::Amazon::DynamoDB and Amazon::DynamoDB, which are worlds apart. The former supports practically the original DynamoDB API, using signature version 2, and accepting numbers, strings, and sets of those types as data types. Not even binary types. Amazon::DynamoDB uses the version 4 signature and an updated data model, along with an async client built on IO::Async. Yes, seriously, in 2014 someone didn't use AnyEvent.

This wouldn't be so much of a problem if I didn't sail straight into RT#93107 while using it. The most insidious thing is that fds only get closed sometimes, so it mostly seems to work. Until it doesn't. But there's a patch... That's been open without comment for many months.

This is not unlike another patch, for AnyEvent::Connection, open even longer. I can understand if the maintainer thinks it's unimportant because it's "only a deprecation" or because an updated common::sense avoids the issue on affected Perl versions. But to not even comment? The package is apparently dead.

I recently ran into a little trouble finding any OAuth 2.0 server modules. I didn’t see anything general, nor anything for Dancer to be an OAuth server. We have plenty of clients and client-integrations to providers, though, and it didn’t take my boss long to find a PHP library that he wanted to use.

But enough about CPAN. Let’s talk Perl core.

Perl has been adding numerous ‘experimental’ features over the years. Since 5.16, the current status of all of these has been gathered in the perlexperiment documentation. Though 5.20.1 has now been released, there are items on that list introduced in 5.10.0.

An experimental feature that is neither accepted nor rejected is dead weight. Authors are discouraged from using them, so they’re not benefitting the majority of users, yet they require ongoing maintenance and care by the perl5-porters. An experimental feature that stays experimental for far, far longer than the deprecation cycle is the worst possible outcome.

Just imagine being excited about Perl and having conversations about, “Can it do «cool thing»?” or “Have they fixed «wart» yet?” and having to respond “Well, it’s experimental, so technically yes, but you shouldn’t use it.”

Eventually, that conversation is going to come down to “Why don’t you use «Ruby|JavaScript|PHP» instead? They have more cool stuff, and only one object system.”

Even if Perl made a major push to accept experimental features in “the next release,” there’s anywhere from months to years before that release is widely deployed as a baseline. Unless one wants to build their own Perl every week for their fresh cloud images, which is a whole other can of worms.

Meanwhile, the more unique stuff about Perl—regex syntax, contexts, sigils and their associated namespaces—aren’t exactly banner features anymore. Other languages have been absorbing regex syntax, and the rest of it tends to create bugs, even by authors who understand it all. That’s especially problematic when I change languages frequently, because the Perl-unique stuff takes longer to reload in my head than the rest of the language.

Overall, I’m thankful for what Perl has been for me. However, it seems like it has lost all its momentum, and it doesn’t have anything particularly unique to offer anymore. Certain parts of it like @_ and the choose-your-own-adventure object system feel particularly archaic, and I have to interact with them all the time.

Overall, it seems like basic stuff (IO::Poll working with SSL, please?) and easy fixes aren’t being taken care of, while new code isn’t showing up or isn’t high quality. If CPAN can’t keep up with the web, and the web is eating everything, then Perl will be eaten.

You are in a twisty little maze of event backends, all different.
Most of which can use most of the other backends as its own provider.

It pains me to write this. I don’t want to sound like a “BSD is dying” troll. I know I’m throwing a lot of criticism in the face of people who have put much more into Perl (and making Perl awesome) than I have. Miyagawa is perhaps singlehandedly responsible for why our API server started its life in Perl. But he can’t do everything, so the present signs are that no new Perl will be added to our codebase… because doing so would burden our future.

Sunday, February 8, 2015

Pointless metrics: Uptime

Back in the day when I was on Slashdot, a popular pasttime was making fun of Windows. The 9x line had a time counter that rolled over after 30-45 days (I forget and can’t dredge the real number up off the Internet now), which would crash the system. Linux could stay up, like, forever dude, because Unix is just that stable.

So I spent a while thinking that ‘high uptime’ was a good thing. I was annoyed, once upon a time, at a VPS provider that regularly rebooted my system for security upgrades approximately monthly, because they were using Virtuozzo and needed to really, seriously, and truly block privilege escalations.

About monthly. As in 30-45 days…

I thought that was bad, but nowadays, the public-facing servers my employer runs live for less than a week. Maybe a whole week if they get abnormally old before Auto Scaling gets around to culling them. And I’m cool with this!

I try to rebuild an image monthly even if “nothing” has happened, and definitely whenever upstream releases a new base image, and sometimes just because I know “major” updates were made to our repos (e.g. I just did composer update everywhere) and it’ll save some time in the self-update scripts when the image relaunches.

It turns out that the ‘uptime’ that the Slashdot crowd were so proud of was basically counterproductive. I do not want to trade security, agility, or anything else just to make that number larger. There is no benefit from it! Nothing improves solely because the kernel has been running longer, and if it does, then the kernel needs fixed to provide that improvement instantly.

And if the business is structured around one physical system that Must Stay Running, then the business on that server is crucial enough to have redundancy, backups, failovers… and regular testing of the failover by switching onto fresh hardware with fresh uptimes.

Thursday, February 5, 2015

Constant-time String Comparison

I mentioned hash_equals last post.  One of the things the documentation notes is that, should the string lengths not match, hash_equals will quickly return false and effectively reveal the length difference.

It seems to be a fairly common perception that this is a problem.  Take this StackOverflow answer:
This function still has a minor problem here:
if(strlen($a) !== strlen($b)) { 
    return false;
}
It lets you use timing attacks to figure out the correct length of the password, which lets you not bother guessing any shorter or longer passwords.
I believe an implementation that doesn’t fail fast on different lengths still leaks information, though.  Most of them (i.e. every one I’ve seen, including ones I’ve written before having this insight) compare all characters through the shorter of the two strings.  If an attacker can time comparisons and control the length of one string, then when the ‘constant time’ algorithm quits taking longer for longer strings, the attacker knows their supplied string is the longer one.

Therefore, I don’t believe “fail fast on different string lengths” is something to be concerned with.  If the threat model is concerned with a timing attack, then simply moving it around the function doesn’t actually form a defense.

Tuesday, February 3, 2015

When Did PHP Get That?

5.6.3

5.6.0
5.5.0
  • Generators
  • finally keyword for exception handling with try/catch
  • Password hashing API: password_hash and related functions
  • list() in foreach: foreach($nest as list($x, $y))
  • PBKDF2 support in hash and openssl (hash_pbkdf2, openssl_pbkdf2)
  • empty() accepts arbitrary expressions, not just lvalues
  • array and string literal dereferencing: "php"[2] or [1,2,10][2]
    • But is that useful?
  • ::class syntax: namespace A\B { class C { }; echo C::class; /* A\B\C */ }
  • OpCache extension added
  • MySQL-nd supports sha256-based authentication, which has been available since MySQL 5.6.6
  • curl share handles via curl_share_init and friends [added to this post on 2015-02-17]
  • DEPRECATED: preg_replace's /e modifier, in favor of preg_replace_callback
5.4.0
Removed:
  • safe_mode
  • register_globals
  • magic_quotes
  • call-time pass-by-reference
  • using TZ when guessing the default timezone
  • Salsa10 and Salsa20 hash algorithms
Added:
  • Traits
  • Class::{expr} syntax
  • Short array syntax: [2,3] for array(2,3)
  • Binary literal syntax: 0b1101
  • Closures can use $this in the body: function some_method () { return function () { return $this->protected_thing(); } }
  • Using new objects without assigning them: (new Foo)->bar()
  • CLI server
  • session_status function
  • header_register_callback function
  • mysqli_result now implements Traversable: $r=$dbh->query(...); foreach ($r as $res) { ... }
  • htmlspecialchars() and friends default to UTF-8 instead of ISO-8859-1 for character set.
    • This would be changed to use the default_charset INI setting in 5.6.
    • 5.4's documentation pointed out that default_charset could be used by sending an empty string, "", as the character set to htmlspecialchars—and said you didn't really want it.
5.3.0
Added:
  • Namespaces
  • Late static binding and associated 'static::method()' call syntax
  • "Limited" goto support
  • Closures aka anonymous functions, although they cannot make use of $this until 5.4
  • __callStatic() and __invoke() magic methods, to support $obj($arg) call syntax
  • 'nowdoc' and double-quoted 'heredoc' syntax
    • $x=<<<'FOO' consumes text until FOO and does not do any interpolation / variable substitution on it
    • $x=<<<"FOO" acts just like traditional $x=<<<FOO, both consuming text until FOO and doing interpolation / variable substitution as if it were a double-quoted string.
  • Constants definable outside of classes with the const keyword, not just the define function
  • Short ternary operator: a ?: b equivalent to a ? a : b
  • $cls::$foo syntax for accessing static members of a class whose name is stored in a variable ($cls in this example)
  • Chained exceptions: the constructor sprouted an extra parameter for a previous exception (generally should mean "the cause of this exception being constructed")
  • DEPRECATED: many features that would be removed in 5.4.0, including register_globals, magic_quotes, and safe_mode.

Thursday, January 29, 2015

Perl Futures

I’ve been doing a project with IO::Async and Future, because Amazon::DynamoDB depends on them.

One of the weirdest bits of the API that I’ve run into is that the underlying framework only weakly references Future objects, so if the client doesn’t keep its own reference, then the callbacks will never be invoked. I may not care about “the future object” itself, but I do care about the callbacks I set on it.

I ended up writing a function:

sub hold_future {
    my $f = shift;
    $f->on_ready(sub { undef $f });
    return ();
}

The sub being set does nothing but close over $f in order to create a circular reference. That keeps the Future alive until the callbacks are run. To keep the future from persisting forever, though, it destroys that reference once the future resolves.

One may safely release the future while the callbacks are running, since the resolution machinery is holding onto a strong reference ($self) at that point. Therefore, the future won’t be destroyed until all the callbacks finish.

Now, the rest of my code has clearer purpose:

sub md_delete {
    my ($cb, $id) = @_;
    hold_future(
        dynamo_delete($id)
        ->on_done(sub { $cb->(1) })
        ->on_fail(sub { $cb->(0); log_event("delete $id: $_[0]") })
    );
}

Where dynamo_delete invokes a lot of retry/exponential backoff machinery, and md_delete is a command handler for Memcached::Server.

Having worked with jQuery Deferreds, I find it odd that Future doesn’t provide a separate Future::Promise object. It just hopes that the receiver of the future doesn’t try to call creator-related functions like done.

I also find it odd that the future doesn’t just stick around itself, automatically persisting until the callback phase unless actively canceled by the receiver. Having to manually manage them feels surprisingly alien, even though I’m used to managing file handles and lexical scopes.