Saturday, September 30, 2017

Replacing `mod_fcgid` with `mod_proxy_fcgi`

This is using somewhat outdated technology, and specific to my environment, but I’ve found nothing of value on the internet other than the actual documentation included with FCGI::ProcManager, so I wanted to share what I did to Perl and FastCGI this week.

Recap: mod_fcgid loses its connection table during a “graceful reload” with Apache 2.4 (we have the version included with Ubuntu 17.04, so it’s Apache 2.4.25.) The current connections get broken, and we’ve reached a size where that can interrupt around 20 connections during one reload event.  Which we invoke when deploying new code.

Therefore, I wanted something more reliable, so I built something to use mod_proxy_fcgi to talk to a listening daemon.

We started with a script (our code) named handler.fcgi, which used CGI::Fast to manage the request loop.  In this setup, Apache’s mod_fcgid worked as process manager, spawning a new perl handler.fcgi instance whenever a request was routed to that script and there wasn’t an idle worker to process it.  For N concurrent workers, N users are forced to wait for the startup cost.

To move the process management out of Apache’s hands, we need another process manager, and the plumbing to start it up when the system boots.  I adapted handler.fcgi into daemon.fcgi, and then built the daemon management from scratch.  Let’s start with that.


Caveat: this is simplified.  The OpenSocket parameters and the number of processes are configurable through the environment, using code like $ENV{FCGI_SOCKET_PATH} || ':9005' but I wanted to make the code below more concise.  Likewise, I’ve left out all of our actual preloads, because those are boring.

First, we have our basic setup, and loading of the modules the daemon needs to use:

use 5.014;
use warnings;
use CGI ();
use FCGI ();
use FCGI::ProcManager ();

I’m very paranoid about keeping my scopes clean, thus the empty parentheses to forcibly prevent the modules from importing anything.

At this point in the code, there is an opportunity to preload immutable modules may be needed.  I say “immutable,” because anything loaded at this point will not be reloaded during SIGHUP.  It is already loaded in the managing parent, which is not restarted, so the new workers that are started will inherit the same code.  Therefore, be very careful not to preload something here that will need to be reloaded gracefully!

Now, we get onto the business of starting the daemon.  First, we open a listening socket, with a listen queue depth of 100 (this is the default in CGI::Fast so I just copied it for myself):

my ($socket, $pm, $req);
$socket = FCGI::OpenSocket(':9005', 100);
$req = FCGI::Request(\*STDIN, \*STDOUT, \*STDERR,
    \%ENV, $socket, FCGI::FAIL_ACCEPT_ON_INTR);

This socket will be shared among children.  Note that all of these arguments (passing the filehandle globs, the environment, and the FAIL_ACCEPT_ON_INTR flag) are important for correct operation.

My working theory about the filehandles is that FCGI sets up the filehandles that are passed as the FCGI streams, and thus, passing \*STDERR sets up the STDERR handle to go to FCGI request’s error stream during requests.  (Where it can end up in the Web server’s error log.) The filehandles don’t have to make sense, and don’t “become” the FCGI stream handles.

All of that finishes preparing the parent, so at this point, we can call the FCGI::ProcManager to fork for us:

$pm = FCGI::ProcManager->new({ n_processes => 5 });
$pm->pm_manage(); # forks, never returns in parent

From this point on, code will only execute in the context of a worker process. The manager gets “stuck” inside pm_manage() unless something goes bad and it has to call exit, but even then, it still doesn’t return.

What now remains is to write the main request loop:

require AppCode::DB; # preload, at run time
while ($req->Accept() >= 0) {
    $pm->pm_pre_dispatch(); # defers signals
    CGI->_clear_globals(); # prevent crosstalk
    my $q = $CGI::Q = CGI->new;
    # --- request processing goes here ---
    $pm->pm_post_dispatch(); # acts on signals

Any code loaded between pm_manage() and the start of the while loop will be loaded before any requests, and remain persistent between requests.  It will also be reloaded when the daemon is reloaded via SIGHUP, because the new workers will re-process the require statements.  It’s vital to use require here, and not use, because the latter happens at compile time.  Any use statements here would still be processed in the parent, and those modules would not be reloaded in any workers.

The calls to pm_pre_dispatch and pm_post_dispatch are exactly as instructed in the FCGI::ProcManager documentation.  I looked inside their code, and they make it so that a “please shut down now” signal will be deferred until the request has been processed.

The CGI->_clear_globals() line and the setting of $CGI::Q (the default CGI object) are borrowed from the code of CGI::Fast.  The globals must be cleared, or else the worker can return the wrong response to the client, and really mess things up.  For example, I started getting nested UI elements—instead of loading search results via AJAX, the search form would come back and be put in the page again.

Starting the daemon

I wrote a systemd service file to start up the daemon.  I’m not going to cover it here, because there are probably better systemd resources, and there are other init systems, too.

Everything that was passed to handler.fcgi via FcgidInitialEnv—notably PERL5LIB—is now passed as an Environment setting at daemon startup.

As noted above, there’s no special consideration for input/output/error streams, because they will be shadowed by the FastCGI request streams while processing requests.  (The manager will still write a bit to them, about the worker process lifecycles.  systemd just logs those messages.)

Connecting Apache to the Daemon

In our VirtualHost block, we forward interesting URLs to the proxy:

ProxyPassMatch ^/(.*)\.pl$ fcgi://localhost:9005/$1

Other environments most likely need a different regular expression.

Options, such as enablereuse=on, are also not shown here.

Why CGI::Fast isn’t involved anymore

I tried really hard to keep using CGI::Fast because it had been working in the handler.fcgi version.  However, it didn’t quite allow me enough control to get it integrated with FCGI::ProcManager.

If environment variables like FCGI_SOCKET_PATH are set, then CGI::Fast tries to open the socket to listen on.  However, if there are multiple workers, only one of them can “win” this game, and the rest keep getting “socket already in use” errors and exiting.  (Which, as a worker, means the manager tries to replace them, but it’s futile.)

If the environment variables aren’t set, then CGI::Fast seems to think it’s going to receive a CGI request on STDIN, and the whole thing comes tumbling down when the request is actually entirely blank.  (For unimportant reasons, I don’t need to handle / in our app at work, so I don’t.  It turns out that our app just crashes if such a request comes in.)

I wasn’t able to figure out how to open a socket in the parent, start managing, and then have CGI::Fast wait for requests in the workers.


I worked really hard to stick to my initial plan, and overcome all obstacles. I ended up with a thing that would work great for unaltered apps.

Our app was probably amenable to CGI::PSGI and using Server::Starter: as I rewrote things from slow CGI to FastCGI, I also rewrote their output so that they’re templated (not inline print statements) and the response is only sent from the main request loop.  Using CGI::PSGI probably would have been a better outcome—we would have been able to preload and share more code in the parent, without losing graceful reload support.

In the end, I also noticed startup costs are lower than I expected.  In the days where we ran on m1.small instances, it would add a full second to the page load time to store sessions in DynamoDB, because Net::Amazon::DynamoDB and its Moose dependency would take that long to load.  We rolled out, then reverted, that change, because it increased the request time by an order of magnitude.  But now, a t2.medium runs as fast as my Ivy Bridge desktop, able to preload that same code in around 0.34 seconds, and finish the entire set of preloads in 0.5 seconds.

The change is still worth it for the improvement in user experience when we have to do a deployment, though (which, because FastCGI is persistent, always involves a graceful reload.)


Like others before it, this blog post has been written without using any resources of my employer.  It is only my own time, my own computer, and the knowledge that I now carry in my nerd brain.

No comments: