Monday, January 28, 2013

mod_fcgid and graceful restarts

I see plenty of this in my logs when the server needs reloaded to pick up fresh Perl:

(43)Identifier removed: mod_fcgid: can't lock process table in pid 3218

tl;dr: this appears to be harmless in practice.

The leading portion corresponds to EIDRM (see errno(3)) which comes back out of pthread_mutex_lock and cheerfully gets logged as the failure code of proctable_lock_internal.  The proctable is in turn locked during request handling.

My best guess for the order of events is that the Apache parent receives a graceful restart, unloads and reloads mod_fcgid, which destroys the mutex as a side effect.  After old-generation children tie up their requests, they try to notify their parent that they're available again, only to discover that the mutex is gone.  The child then exits, but it doesn't hurt any clients because they've already been served at this point.

This problem is not fixable in Apache 2.2 because there aren't any hooks for graceful-restart.  It just unloads DSOs without warning, and their first clue anything happened is that they start receiving config events.  By then, the mutex and process table are gone, so the newly-loaded master can't communicate with old-generation children.  Someone did make an attempt to fix this for 2.4 (along with modifying mod_cgid to test their infrastructure) but AFAICT nobody has made this available in mod_fcgid for 2.4 yet.

No comments: