Decoded Node

Friday, April 12, 2013

Random Idea: BBU SSD

Given that RAM doesn't seem to be that expensive anymore (there are plenty of choices for 2GB DDR3 sticks around $20 at NewEgg right now), why not put a bit more onboard an SSD along with a small rechargeable battery?

The extended RAM would be dedicated to a large page cache, which would do its best to hold frequently-written data (to extend the flash life by avoiding writes to it for as long as possible.) A landing zone the size of this cache would be reserved in the NAND, and in event of sudden power loss, the pages pending in the DRAM would be dumped into the landing zone under battery power.

Presumably, the 400+ MB/s that current SSDs quote for sequential writes involve the overhead of the OS, host interface, and wear-leveling scheme, and represents a lower bound on performance of the panic save. On a 2A drive writing 2.00 GB, also intended to be conservative numbers, 5.12 seconds of power is required, for consumption of 170 mAh.

(If the manufacturer quoted their transfer speed in SI and I'm actually writing GiB of data, those numbers change to 382 MiB/s yielding 5.37 seconds of transfer time and 179 mAh of power. No big deal.)

Wednesday, April 10, 2013

Lost in the Complexity

I was working on a post about VirtualBox’s networking capabilities and how none of the modes provided for what I wanted out-of-the-box. But the tirade was interrupted by a simple thought: VirtualBox allows up to four virtual network cards per guest. I could simply configure a guest with two of them—one connected to NAT for reaching the outside world, and the other connected to host-only networking so I could reach it without having to set up port mapping rules. (Bridge mode is unsuitable because I want the machine to be externally invisible; also, the LAN is DHCP and I want the machine to have a static IP without involving anyone else.)

That turned out to work, by the way. The machine still has access to the Internet but also nmap against its static (host-only) IP can see all the open services at their native, unmapped ports.

In the moment I realized that a dual-card configuration would work, I was also struck by the amount of time I had spent coming up with a single-card solution to the external access problem… only to have it turn out to be the wrong problem to be solving. Or, since it wasn’t technically infeasible, a problem made over-complex by the accidental assumption of a single network.

This illuminates one of the main problems of programming: the tension between breadth and depth. To determine if a plan is technically feasible, one needs to dive deeply into all the details and try to fit the final product together in their mind. But, the feasibility alone is not a fitness function. One must avoid getting so lost in the details that this becomes the only approach visible, and actively “back out” to search for hidden assumptions and gratuitous decoupling.

As a younger programmer, I spent some happy hours working on database abstraction layers, and the projects never changed database. These were all in-house projects for in-house purposes where all available (and foreseeable) DBA knowledge was built on MySQL. Building systems that “could” be changed to other ANSI compliant systems was both irrelevant and unnecessarily limiting. I didn’t allow any MySQL specific optimizations, so that all queries could be represented faithfully on any DBMS.

However, the Serendipity weblog system can run on MySQL or Postgres and for them, it isn’t gratuitous. Their software is externally distributed and not every admin using the software will necessarily be either conversant with or favorable toward MySQL. Thus, Serendipity’s user base becomes larger if it has support for other engines. The same decoupling, but no longer gratuitous, and they probably implemented it better than me anyway.

When the VirtualBox Network Quest began, I made the assumption that I wanted one network, and because that assumption was invisible to me, I chased the details down to completion before spotting the alternative.

OTOH, thinking so deeply about it led to a couple of other interesting observations, but they'll have to wait for another post.

Wednesday, March 13, 2013

PHP unpack()

In Perl, unpack is pretty easy to use. You unpack something with the format string used to pack it, and you get a list of the values that were packed. I'm not sure the historical reasoning behind PHP's version of unpack, but they certainly made it as horrible as it could possibly be.

To get Perl-like behavior, the simplest path appears to be:

<?php list(...) = array_values(unpack("vlen/Vcrc", $header)); ?>

Instead of the simple "vV" it would be in Perl, you give each format code a unique name and separate them with slashes. You have to provide a name and not an index because PHP interprets a number as the repeat count. There's nowhere to place an index in the unpack format. Then, array_values() gives you back the items in the order specified in the unpack string, since PHP associative arrays maintain their ordering. Finally, the field names must be unique, or else unpack will happily overwrite them.

If you try to use "vV" as the format code, there will only be one value unpacked... named "V". If you try "v/V", there will be second value... at index 1, where it overwrote the first value.

If you're unpacking just one value, you might try to write list($x) = unpack(...) but this won't work—pack inexplicably returns a 1-based array. PHP will generate a notice that index 0 doesn't exist and assign NULL to $x.

Saturday, March 9, 2013

Theming Drupal 7: Block vs Node vs Field et al

What is the difference among pages, regions, nodes, blocks, and fields?

There's one page that may contain multiple regions. Regions are containers for nodes and/or blocks. Multiple nodes can appear within a region if the URL being accessed is a listing view, in which case the nodes will be rendered in teaser form with "Read More" links leading to the full view. Multiple blocks can also appear in a region.

Blocks seem to correspond to 'widgets' in other systems: chunks of content that can be dropped into the sidebar and remain fairly static on many pages. For instance, Drupal's search box is a block, and it lands in the sidebar_first region of both default node types. Blocks do not have fields.

Nodes, on the other hand, correspond to the content that actually gets added to the CMS. Blog posts, static pages, whatever. Nodes get URLs assigned to them, either through the URL aliasing features, or the default node/4 style path. (Or through any SEO friendly URL generation modules you may have installed.) Nodes have a type and node types have fields. Fields receive values per-node that are displayed on the node type.

Everything related to a node is rendered inside a single container, corresponding to the page content variable. Overall, the URL, node with its type and fields, and page content variable are all one interrelated thing when viewing a single node. Pages and regions are related to the theme as a whole. Blocks are strongly related to a theme, but also customizable based on node types through the block's Configure link.

Blocks, nodes, and fields are fairly customizable and appear in the admin interface under the "Structure" menu item. Controlling blocks is done with the Blocks item in the menu (straightforward enough); controlling nodes is done through Content Types.

I should probably note that the paths through the admin interface given are for the default setup where Bartik is the main theme and Seven is the administration overlay theme. Those paths are not guaranteed for other themes, since themes are PHP and can do nearly anything they want.

If you want "a content area" with several "pieces" to it, the path of least resistance is to construct node.tpl.php with the contents of that content area inside, using fields to display each individual "piece" desired. Then in your admin interface, establish the fields so they show up when editing the page.

To make that clearer, let's say MyCorp wants a video on their front page with a blurb to the side, a graphical separator, and another couple paragraphs below. I could make a mycorp_video content type, and add two fields (field_video_embed and field_video_blurb), then create a node--mycorp-video.tpl.php file with the container divs, central bar, and calls to <?php render($content['field_video_embed']); ?> inside their respective containers. Then I could leave the couple of paragraphs as "body" content and print that below the separator. Once the template is ready, the node type can be created in the admin interface, the node actually added, and finally set to be the front page of the site.

Controlling something outside of the content area based on the content (node) type is not possible by default, but can be done with an override in the template.php file for the theme:

function themeName_preprocess_page(&$vars, $hook) {

  if (isset($vars['node'])) {

    /* If the node type is "blog_madness" the template suggestion will be "page--blog-madness.tpl.php". */

    $vars['theme_hook_suggestions'][] = 'page__'. $vars['node']->type;

  }

}

The above code was posted by JamieR at drupal.org/node/1089656. By default, drupal's page renderer only knows about specific nodes (like page--node--4.tpl.php) and not content types (aka node types) in general, which is what this override adds.

A second approach is to use the CCK Blocks module to convert fields into blocks. This allows them to appear on the block layout and be placed in regions in spite of being node-specific. The blocks are then made visible in region templates with a cck_blocks_field prefix, for instance cck_blocks_field_video_embed for a video_embed field.

The latter approach is actually the one I ended up taking. I needed to handle several optional areas in various combinations. Instead of a big list of node types and duplication of the markup for any fields shared between types, I have two basic node types and regions handle sharing the markup and displaying the available fields (or nothing, when no fields are set.)

Monday, February 11, 2013

EC2 utilities vs. $AWS_CREDENTIAL_FILE

Most of the AWS command line tools accept their login credentials from a file named in the AWS_CREDENTIAL_FILE environment variable and formatted like so:

AWSAccessKeyId=AKIAEXAMPLE

AWSSecretKey=Base64FlavoredText/Example

The EC2 tools predate this scheme and still refuse to use it, preferring the credentials to be set directly in the environment. I decided to over-engineer it and pull the EC2 environment variables from the file:

export AWS_ACCESS_KEY=$(grep '^AWSAccessKeyId' "$AWS_CREDENTIAL_FILE" | cut -d= -f2)

export AWS_SECRET_KEY=$(grep '^AWSSecretKey'   "$AWS_CREDENTIAL_FILE" | cut -d= -f2)

(Those will probably wrap on blogger; in code, they're two lines, each beginning with "export".) Now I can put the credentials in one place, and they're available to all of the tools.

Thursday, January 31, 2013

Notes on FastCGI and webservers

This post is a distillation of what I've learned over the past couple of months. There's both new information here, and links to everything else on the FastCGI topic that I've written so far.

Read on ⇒

Wednesday, January 30, 2013

Minimal, Working Perl FastCGI Example, version 2

This is an update to a previous post. File layout remains the same: "site" is a placeholder for the actual site name, and /home/site/web is the actual repository of the project. Static files then appear under public, and Perl modules specific to the site in lib/Site (i.e. visible in Perl as Site::Modname when lib is put in @INC). I am still using mod_fcgid as the FastCGI process manager.

The major improvement: This version handles FCGI-only scripts which have no corresponding CGI URL. I discovered that limitation of the previous version when I tried to write some new code, where Apache or mod_fcgid realized that the CGI version didn't exist, and returned a 404 instead of passing it through the wrapper. As a consequence of solving that problem, FcgidWrapper is no longer necessary, which gives the dispatch.fcgi code a much cleaner environment to work in.

Everything I liked about the previous version is preserved here: I can create Site/Entry/login.pm to transparently handle /login.pl as FastCGI, without requiring every other URL to be available in FastCGI form. It also stacks properly with earlier RewriteRules that turn pretty URLs into ones ending in ".pl".

Apache configuration:

# Values set via SetEnv will be passed in the request;
# to affect Perl startup, it must be FcgidInitialEnv
FcgidInitialEnv PERL5LIB /home/site/web/lib
RewriteCond /home/site/web/lib/Site/Entry/$1.pm -f
RewriteRule ^/+(.+)\.pl$ /home/site/web/dispatch.fcgi [L,QSA,H=fcgid-script,E=SITE_HANDLER:$1]
<directory /home/site/web/fcgi>
Options ExecCGI FollowSymLinks
# ...
</directory>

Again, the regular expression of the RewriteRule is matched before RewriteCond is evaluated, so the backreference $1 is available to test whether the file exists. This time, I also use the environment flag of the RewriteRule to pass the handler to the dispatch.fcgi script. Since I paid to capture it and strip the leading slashes and extension already, I may as well use it.

That means the new dispatch.fcgi script doesn't have to do as much cleanup to produce the module name:

#!/home/site/bin/perl
use warnings;
use strict;
use FindBin qw($Bin);
use Site::Response;
use Site::Preloader ();
while (my $q = CGI::Fast->new) {
my ($base, $mod) = ($ENV{SITE_HANDLER});
$base =~ s#/+#::#g;
$base =~ s#[^\w:]##g;
$base ||= 'index';
$mod = "Site::Entry::$base";
my $r = Site::Response->new($base, "$Bin/templates");
eval {
eval "require $mod;"
and $mod->invoke($q, $r);
} or warn "$mod => $@";
$r->send($q);
}

I remembered to include the $r->send call this time. I pass the CGI query object so the response can call $q->header. That's not strictly necessary—FCGI children process one request at a time and copy $q to the default CGI object, meaning header should work fine alone, but I didn't know that yet.

I also remove non-{word characters or colons} from the inbound request for security, since my site uses URLs like /path/somereport.pl. You may need to carefully adjust that for your site.

Site::Response is initialized as a generic error so that if the module dies, the response written to the client is a complete generic error. Otherwise, the template is selected and data set, so the send call ships the completed page instead.

The only thing left that I'd like to do is make this configuration more portable between web servers instead of dependent on Apache's mod_rewrite and mod_fcgid, but since Apache isn't killing us at work, it probably won't happen very soon.