Saturday, September 14, 2013

Go Slices and the Hash API

I finished my first little program in Go months ago.  One of the mysteries was "Why does Hash.Sum take a byte slice if it doesn't write into it?"

Well, the question is broken.  Let's look at some code that attempts to hash a file's content and save the result into a file-info structure, but which doesn't actually work:
sumbuf := make([]byte, hsh.Size())
hsh.Sum(sumbuf)
fi.hash = sumbuf
What happens?  This allocates a byte slice with both len and cap equal to the output size of the underlying hash, filled with 0x00 bytes.  It then passes into hsh.Sum (which, in this particular case, was crypto/sha512) which copies the internal state, finalizes the hash, then wraps everything up with return append(in, digest[:size]...).

Since sumbuf didn't have any room (make(T, n) is equivalent to make(T, n, n)) the append didn't have anywhere to put it.  So it did what it had to do, allocating a new backing array of sufficient size, then copying both in (aka sumbuf) and the elements from the digest slice into it.  The expanded slice backed by this new array then gets returned... only to be discarded by me.  fi.hash ends up being the slice I allocated containing all zero, which made the program think all files were duplicates.  Oops!

What works?
sumbuf := make([]byte, 0, hsh.Size())
fi.hash = hsh.Sum(sumbuf)
First, the return value must be assigned: the original slice's backing array is shared, but append creates a new slice with a larger length to hold the data.  fmt.Printf's %p will show that fi.hash and sumbuf live at the same address, and they have the same capacity as well, but the len of each differs.

Second, if we don't want append to allocate a fresh array, we need a slice with enough free space after its length to hold the data it wants to copy there.  A slice with nothing in it (yet) and the capacity of the hash length is precisely the smallest thing to fit this description.

Now that we have some working code, let's reflect on the point of passing in a slice to Hash.Sum in the first place.  The goal is to avoid allocating inside Sum, if the caller has space for it.  But Sum already allocates and copies a slice: it needs to finalize the hash for Sum, without disturbing the original state so that writers can still add data.  By working in a temporary buffer on the stack and copying at the end, it doesn't make a discrete allocation on the heap, but it still needs to ask append to copy it.

Why not begin d := append(in, _____) and then work inside the caller's buffer directly?  My guess is that working in a private buffer prevents Sum from leaving partial state visible to the rest of the program.  I don't know if it is, but I would not be surprised if append is atomic from the point of view of running goroutines, and clearly Sum needs to allocate in order to be re-entrant.

Tuesday, September 10, 2013

The Forgotten Gem: LDAP

I keep wanting a "configuration lookup service" with the following properties:
  • Optimized for a read-heavy workload (config stays the same for months).
  • Hierarchical structure (by data center, by server group, by specific host if necessary).
  • Queryable at any level.
From what I understand, it looks like this LDAP was practically designed for this.  But I've never seen anyone use it for anything but user accounts, which seems like kind of a waste.

Saturday, September 7, 2013

Pen and Paper

One of the surprises I had when I wrote some golang code was that static typing was a real bear to wrestle with again.  Once I had the type errors sorted out, though, programs ran fairly well.  By the time I could get it past the compiler, I had been forced to think about it deeply enough that there were far fewer bugs in the final result.
Of course, it's not magic.  My code has still deadlocked.

The mindset that Go enforces, by virtue of checking the types, has led me to try being more careful in my regular work.  I now try to make notes on everything that I need to come back to as I'm exploring a problem (such as: "what code interacts with this variable?") so that I can check my plan against all relevant cases.  Likewise, a multi-step change will end up with a few bullet points like...
EmailBounce (#3351)
  • set in SES bounce processor
  • clear when email changes
  • react in notifier code
This also helps in event of interruptions, such as realizing the office manager's radio is playing the same annoying pop songs again today, for the 110th day in a row.  There's a list of stuff to help re-establish concentration, knowing I'm not forgetting anything.

For complex issues and solutions, doing the design on paper has an extra benefit.  Paper encourages brevity.  There's only so much room within the confines of a page, and it takes a while to write.  Consequently, pseudo code tends to stay pseudo code, at an appropriate level of abstraction, when writing out steps.  It's the exact opposite of the temptation to switch to writing the real code when trying to write pseudo code in a text editor. It looks like a convenient shortcut to combine the two coding steps, but then the faults in the overall plan don't get noticed until (often) most of the way through—then a random amount of already-written code is made irrelevant or wrong.

Modifying half-implemented approach A to become approach B (hopefully not with the same last-lap change of course) adds a lot of mental state to juggle.  Now there are four states complected: original code, usable code from A, unusable but not yet changed code from A, and code from B.  It can be quite useful, and a bit simpler, to work out some of those false starts in advance, on paper.  Then there's both a record of where my thoughts have been, and a clean body of current code to implement the final result.

As useful as paper can be, there are also times when it's at a disadvantage.  For picking apart tricky, complex control flow with nested ifs, repeated conditionals, and the like, I find it easiest to copy the code into a new window and then start deleting lines and replacing them with pseudo-code comments.  Often, hundreds of lines can be reduced to a single window, which makes it trivial to get a higher-level overview of what's being accomplished.

Paper's other major disadvantage is that it lacks undo.  I've learned by the number of cross-outs on the page just how frequently I rename things as I get a better sense of the problem and what those parts represent in the whole.  (I have even been known to choose a name, cross it out in favor of an alternate, then cross out the alternate to restore the original.)

Overall, though, for the appropriate tasks, it's been a great advantage these past few months to put more effort into paper and less into backtracking in vim.

Thursday, September 5, 2013

Moose Alternatives

I once wrote about Mouse when I had discovered it.  Amongst all this recent research, I have also discovered Moo.

Moo strives to be even lighter and faster to load than Mouse, which was already Moose without antlers.

I swear, next year I'm going to find an even lighter module called "Mc".

Wednesday, September 4, 2013

More FastCGI: Apache 2.4, PHP-FPM, PSGI, and hot deployment

Driven by mod_fcgid's failings at graceful restart and preforking, I've been looking hard for alternatives.

tl;dr

External FCGI is best done via mod_fastcgi on Apache prior to 2.4, or mod_proxy_fcgi on 2.4.  mod_proxy in general is super awesome on 2.4.

php-fpm ships with an init script and reloads gracefully via SIGUSR2, so that's all you really need there.

For Perl, gracefully restarting an FCGI app is difficult, so the better approach for PSGI apps is to run them in an HTTP server with Server::Starter support (e.g. Starman or others).