Monday, September 28, 2015

The Value of Open

(I found this ancient draft, last edited in January 2012. It looks pretty solid for its time, and potentially still relevant, so with that in mind, here it is...)

I met the internet over a decade ago, as an idealistic teen.  Linux was a rising star, so when I got a computer for college, I found a builder who was willing to sell systems with Windows 98 and Red Hat Linux side-by-side.  In those bygone days, the 2.2 kernel was near the end of its life, but USB support was backported while we waited for 2.4.  (I'm sure other things were, as well, but that is the one that had the most impact on my USB scanner.)  I connected my DIN hand-me-down-and-down-and-down keyboard from the old 486 I had been using to a PS/2 converter and joined the Future with my new machine.

I believed in everything about open source back then.  We even had proof: Linux! and Red Hat! We were totally not chasing taillights!  What could possibly be wrong with our utopia?

Friday, September 25, 2015

Apache Config Merging

(I found this post I drafted a year ago. I don't know why it isn't posted, so here it is...)

The Apache documentation tells you how to order directives in the same scopes, but neglects to remind you about the way different scopes merge.  So, as of 2.4, here's an overview of how it all works.

When the server first receives a URL, it searches for matching containers to determine the active configuration.  The following containers are checked at this stage, highest priority first ("last to first" or "first to last" refers to the order that the sections or directives are listed in the configuration file):
  1. <If>, from last to first.
  2. <Location> and <LocationMatch>, last to first.

Once the server has a physical path, directives are applied, highest priority first:
  1. <Files> and <FilesMatch>, last to first.
  2. <DirectoryMatch> sections matching the directory, last to first.
  3. From most- to least-specific directory path (e.g. /foo/bar before /), regardless of order:
    1. Directives in .htaccess files, if allowed and present.
    2. <Directory> section.  I suspect, but haven't verified, that multiple Directory sections on the same path (e.g. <Directory /foo></Directory> ... <Directory /foo></Directory>) will apply last to first.
Directives are also merged with VirtualHost and server-wide (outside of any other section) contexts, with priority again being given to VirtualHost over the server-wide directives.  That is, a ProxyPass in the server-wide section will apply to all virtual hosts by default, but a ProxyPass within a <VirtualHost> section will be able to replace the server-wide rule.

The ordering of directives within sections, and what happens with duplicates (at multiple priority levels) in general, is defined by each individual module.
  1. RewriteRules execute first-to-last when the section containing them is processed.  Stopping is subject to flags and things (mod_rewrite is a powerful beast): the [L] flag and any that imply it end processing of rewrite rules in that section; for example, rules in <VirtualHost> cannot stop rules in <Directory> from applying.
  2. ProxyPass and ProxyPassMatch rules execute first-to-last, stopping when any match is found.  Thus the "longest match first" rule given in mod_proxy's documentation.
  3. Alias, AliasMatch, Redirect, and RedirectMatch rules execute first-to-last, stopping when any match is found.  Likewise, this produces a "longest match first" rule that is given in the mod_alias documentation.
  4. Whether a URL is tried with other modules (like mod_proxy or mod_alias) after RewriteRules have taken effect depends on how the RewriteRule is written and where it is placed.  I am not sure I understand the finer points of this, but the pass-through flag [PT] exists to force mod_rewrite to treat it as a URL and let other modules have a chance to handle it.
The main takeaway here is to remember what level you're working at, all the time.  If you place two ProxyPassMatch directives each in a separate <LocationMatch> block, then their order of application is defined by the rules for <LocationMatch>.  It is only when ProxyPass/Match directives are sharing the same section (say, both within <VirtualHost>) that they use the longest-match-first rule from mod_proxy.

One other quirk of laying out mod_proxy directives is that they're not actually valid within <Directory> or <Files> sections.  mod_proxy itself deals only with the URL space, and if another module has decided on a file-system path, then Apache is implicitly serving the content directly, as origin server.  It is then too late for pure URL-to-URL manipulations, which is the level that mod_proxy works at.

Saturday, September 5, 2015

Templates and DOMs

In my last post, I mentioned my ideal of keeping “HTML” generation as operations on a DOM tree, instead of assigning variables to templates and using string substitution. Parse the initial template file, fill it with data, then render it once with safe encoding (where relevant) at the end.

I also know why this approach isn’t as popular: everyone hates the DOM.

Friday, September 4, 2015

Velocity: NIH vs. Frameworks

I really hate frameworks. I also hate framework benchmarks that are done as, “let’s see how fast we can template static data into a tiny file! Who cares about branding, databases, proper output escaping, realistic access patterns, or accurate entity-body sizes?”

I hate frameworks mostly because I always feel like I can write a faster script in pure PHP.

It doesn’t really help that I have certain ways I’d like to do things, and most frameworks actually don’t do it that way. Case in point: “HTML templates” should be built by DOM manipulation, just like SQL queries should be prepared (or better.) Pasting strings into strings for another interpreter is the deepest of follies… and the most popular/frequent approach.

Also not helping: the fact that I learn about whole new classes of vulnerability when someone writes up a report showing that, by carefully crafting a string that goes through unserialize($_COOKIE['foo']) or even $cls = $_GET['bar']; new $cls(), they can get arbitrary PHP execution on framework X. No need to install their own webshell!

Unfortunately, I’ve also gotten tired of writing raw HTTP handling and dispatch code. (This has gotten especially tiresome as features like “get the correct client IP from X-Forwarded-For instead of using REMOTE_ADDR blindly” have become necessary, since ELB became part of production.)

The other downside is that writing my own ‘nano-framework’ means that everyone else on the team gets stuck learning my extremely non-portable approach when they want to work on a site I built. Or they can just blatantly ignore the Architecture, because after all, writing code is more fun than reading it. (I’d be more angry about this, but… guilty as charged. See also: frameworks.)

Two really interesting things have happened, though.
  1. TechEmpower has been doing some amazingly awesome, in-depth, serious framework benchmarks for years now, and posting the results.
  2. With the rise of micro-frameworks, some handy reusable libraries like the Symfony HttpFoundation have been published.

It’s clear that I can be more productive by leveraging code that’s already written. (And debugged.) It’s also clear that I don’t want to carry this to the extreme and start using Symfony2—just check out the “framework overhead” tab, or maybe the “Errors” column, at TechEmpower. I don’t know where the happy medium is yet, but writing my own thing is not it.

tl;dr: the moral of this post is, don’t keep rewriting your own code when you can find someone else’s that works. Keep looking for light weight though, because it’s usually a good proxy for other qualities like clarity, speed, size, and API stability. OTOH, be able to recognize when something isn’t serving your needs. I guess it’s all a hard balance to strike, but NIH will slowly crush you. And you won’t notice until you’re dead.

Thursday, September 3, 2015

SNS Deployment update

As an update to this old post from 2013…

We have moved to using php-fpm, so naturally suEXEC has been replaced by php-fpm’s configuration. That allows for running multiple pools, each running PHP scripts under their own user.

We have the “main” pool still using the same unprivileged user as Apache, and then there’s a “privileged” pool that uses the privileged user. Only the traffic on the port receiving SNS notifications is directed to this privileged pool. The main pool still has the same permission it would if it were running under mod_php, to ease the transition.

The transition was relatively painless, but only because I’d already converted per-dir .htaccess files into rules in the server config. It was part micro-optimization, part “I will probably want nginx someday.” Although Apache is still serving our needs admirably.

Wednesday, September 2, 2015

Clean up when closing a terminal

I've taken to clearing my ssh-agent identities and sudo timestamps when I close my shell, by putting in my ~/.bash_logout file:

if [[ -n $SSH_AUTH_SOCK && -z $SSH_CONNECTION && $SHLVL = 1 ]] ; then
 ssh-add -D
 sudo -K
fi

One caveat: the above only works using the shell's exit (or logout or Ctrl + D), not with iTerm2's close button.  However, that can be fixed by using an exit trap in .bash_profile instead, like so:

clearKeys() {
 ssh-add -D
 sudo -K
}
if [[ -n $SSH_AUTH_SOCK && -z $SSH_CONNECTION && $SHLVL = 1 ]] ; then
 trap clearKeys EXIT
fi

To be clear, the latter version requires no changes/additions to .bash_logout.

Rationale: I usually work on a desktop, and keep a copy of my work in sync on a laptop using unison.  Making the SSH connection from the laptop adds the key to the session's ssh-agent, but I don't want that to persist after sync is finished.  I don't want keys to stay active while I'm not planning on using them soon.