Thursday, January 31, 2013

Notes on FastCGI and webservers

This post is a distillation of what I've learned over the past couple of months.  There's both new information here, and links to everything else on the FastCGI topic that I've written so far.

Contents

  1. mod_fcgid
  2. php-fpm (nginx)
  3. other things I tried to use
  4. concluding remarks

Wednesday, January 30, 2013

Minimal, Working Perl FastCGI Example, version 2

This is an update to a previous post.  File layout remains the same: "site" is a placeholder for the actual site name, and /home/site/web is the actual repository of the project.  Static files then appear under public, and Perl modules specific to the site in lib/Site (i.e. visible in Perl as Site::Modname when lib is put in @INC).  I am still using mod_fcgid as the FastCGI process manager.

The major improvement: This version handles FCGI-only scripts which have no corresponding CGI URL.  I discovered that limitation of the previous version when I tried to write some new code, where Apache or mod_fcgid realized that the CGI version didn't exist, and returned a 404 instead of passing it through the wrapper.  As a consequence of solving that problem, FcgidWrapper is no longer necessary, which gives the dispatch.fcgi code a much cleaner environment to work in.

Everything I liked about the previous version is preserved here: I can create Site/Entry/login.pm to transparently handle /login.pl as FastCGI, without requiring every other URL to be available in FastCGI form.  It also stacks properly with earlier RewriteRules that turn pretty URLs into ones ending in ".pl".

Apache configuration:
# Values set via SetEnv will be passed in the request;
# to affect Perl startup, it must be FcgidInitialEnv
FcgidInitialEnv PERL5LIB /home/site/web/lib
RewriteCond /home/site/web/lib/Site/Entry/$1.pm -f
RewriteRule ^/+(.+)\.pl$ /home/site/web/dispatch.fcgi [L,QSA,H=fcgid-script,E=SITE_HANDLER:$1]
<directory /home/site/web/fcgi>
    Options ExecCGI FollowSymLinks
    # ...
</directory>
Again, the regular expression of the RewriteRule is matched before RewriteCond is evaluated, so the backreference $1 is available to test whether the file exists.  This time, I also use the environment flag of the RewriteRule to pass the handler to the dispatch.fcgi script.  Since I paid to capture it and strip the leading slashes and extension already, I may as well use it.

That means the new dispatch.fcgi script doesn't have to do as much cleanup to produce the module name:
#!/home/site/bin/perl
use warnings;
use strict;
use FindBin qw($Bin);
use Site::Response;
use Site::Preloader ();
while (my $q = CGI::Fast->new) {
    my ($base, $mod) = ($ENV{SITE_HANDLER});
    $base =~ s#/+#::#g;
    $base =~ s#[^\w:]##g;
    $base ||= 'index';
    $mod = "Site::Entry::$base";
    my $r = Site::Response->new($base, "$Bin/templates");
    eval {
        eval "require $mod;"
            and $mod->invoke($q, $r);
    } or warn "$mod => $@";
    $r->send($q);
}
I remembered to include the $r->send call this time.  I pass the CGI query object so the response can call $q->header.  That's not strictly necessary—FCGI children process one request at a time and copy $q to the default CGI object, meaning header should work fine alone, but I didn't know that yet.

I also remove non-{word characters or colons} from the inbound request for security, since my site uses URLs like /path/somereport.pl.  You may need to carefully adjust that for your site.

Site::Response is initialized as a generic error so that if the module dies, the response written to the client is a complete generic error.  Otherwise, the template is selected and data set, so the send call ships the completed page instead.

The only thing left that I'd like to do is make this configuration more portable between web servers instead of dependent on Apache's mod_rewrite and mod_fcgid, but since Apache isn't killing us at work, it probably won't happen very soon.

Monday, January 28, 2013

mod_fcgid and graceful restarts

I see plenty of this in my logs when the server needs reloaded to pick up fresh Perl:

(43)Identifier removed: mod_fcgid: can't lock process table in pid 3218

tl;dr: this appears to be harmless in practice.

The leading portion corresponds to EIDRM (see errno(3)) which comes back out of pthread_mutex_lock and cheerfully gets logged as the failure code of proctable_lock_internal.  The proctable is in turn locked during request handling.

My best guess for the order of events is that the Apache parent receives a graceful restart, unloads and reloads mod_fcgid, which destroys the mutex as a side effect.  After old-generation children tie up their requests, they try to notify their parent that they're available again, only to discover that the mutex is gone.  The child then exits, but it doesn't hurt any clients because they've already been served at this point.

This problem is not fixable in Apache 2.2 because there aren't any hooks for graceful-restart.  It just unloads DSOs without warning, and their first clue anything happened is that they start receiving config events.  By then, the mutex and process table are gone, so the newly-loaded master can't communicate with old-generation children.  Someone did make an attempt to fix this for 2.4 (along with modifying mod_cgid to test their infrastructure) but AFAICT nobody has made this available in mod_fcgid for 2.4 yet.

Friday, January 11, 2013

Fun, Work, Puzzles, and Programming

Some programming tasks are just more fun than others. The same thing extends to languages—why are Perl and Ruby so much more fun to work with than Python?

I suspect that the answer lies in the scope of the solution space, in a sweet spot between “too straightforward” and “too complex.”

Wednesday, January 9, 2013

PHP's debug_backtrace: a compact guide

Every time I need to use this function, I can't remember how it works.
  1. The array includes all call sites leading up to the current stack frame, but not actually the current one.  (Everything in the current frame is still in scope to you, so you can use __FILE__ and __LINE__ or your current variables directly.)
  2. The array is indexed with 0=innermost to N=outermost frame.
  3. Each array index gives you information related to the call site of the next frame inward / earlier in the array.  That is, $bt[0] gives you the immediate caller of your current point of execution.  $bt[0]['function'] refers to the function or method invocation that called you, e.g. if the main code executes foo(1), then inside function foo, $bt[0]['function'] is foo.  The file and line point to the file/line containing the call.
  4. When a 'class' key is present, it is the class of the line of code actually executing the call, i.e. what __CLASS__ is at the 'file' and 'line'.
  5. When an 'object' key is present, it has the actual object being used for dispatch; i.e. get_class($bt[$i]['object']) may return either the same value as 'class', or any descendant of that class.
  6. The 'type' key, when present, is either -> or :: for dynamic or static calls, respectively.  The latter means that the 'object' key won't be set.
  7. There is no way in my PHP (5.3.3-14_el6.3 from CentOS updates) to view the invoked class of a static call, e.g. if SubThing::foo is called but Thing::foo is executed because SubThing didn't override foo.  Per the rules above, 'class' will still report Thing.
I needed to know this (this time) because I wanted to write a rough equivalent to Perl's carp in PHP:
<?php
function carp () {
  $msg = func_get_args();
  if (empty($msg)) $msg = array('warned');
  $bt = debug_backtrace();

  // find nearest site not in our caller's file
  $first_file = $bt[0]['file'];
  $end = count($bt);
  for ($i = 1; $i < $end; ++$i) {
    if ($bt[$i]['file'] != $first_file)
      break;
  }

  if ($i == $end) {
    // not found; try the caller's caller.
    // otherwise we're stuck with our caller.
    $i = ($end > 1 ? 1 : 0);
  }

  error_log(implode(' ', $msg) .
    " at {$bt[$i]['file']}:{$bt[$i]['line']}");
}
Obviously this is a bare-bones approach, and could be adapted to pick different (or report more) stack frames, etc.  But, it Works For Me.™