Saturday, April 14, 2018

Effectively Using Future in Perl

I’ve been working a lot with the Future library and IO::Async in Perl recently.  There was a bug in our memcache/dynamo proxy, so I ended up doing a lot of investigation about Futures in order to simulate the bug and verify the fix.  So, I want to talk about how Futures work and why it was so difficult for me.



The thing that stands out the most about the design of the library is that all the methods are available on the Future objects; there’s no separate deferred object (jQuery style) that hides the producer-side “implementation” methods.  Code that returns ‘a Future’ is trusting that the caller will not call any producer-side “implementation” methods to pass results (e.g. done.) The caller is expected to be disciplined, and only call the consumer-side “user” methods like on_done to set callbacks.

That’s all optimization.  It allows the Future implementation to use stackless iteration, without being required to recurse.

Sequencing vs. User Methods


The next major thing about the library is the division between “sequencing” and “user methods.”  There are a group of methods that sequence futures and return a new future wrapping the entire computation, including then, else, and followed_by.  There are easy-to-miss notes in the documentation that the callbacks given to these methods return “a future” and the method as a whole “returns a new sequencing future.”

The user methods exist only to take a callback for notifications.  If they return another Future, it does not become part of any chain; it just gets lost.  User methods include on_done, on_fail, and on_ready.

For a useless demonstration (nothing is ever completed, because no code to do so is scheduled for the event loop), consider the following:

$a = Future->new;
$a->on_done(sub {
    say "a wins: @_";
});

This is a minimal example, which creates only one future, uses a user method to attach a callback to it, and the whole thing returns $a.  Specifically, the last expression, which is the on_done() method call, returns $a again.

The on_done() block can do whatever it wants, but it can’t cause further work on $a to happen by returning another future.  When $a is done, it’s done.

This approach is good for notifications that a process happened.  (Or has not happened, in the case of cancelation.)

Compare to the following:

$a = Future->new;
$c = $a->then(sub {
    $b = Future->new;
});

This creates three futures: $a and $b explicitly, and $c choosing between them depending on the success of $a.

That is, if $a becomes done (finishes successfully), the anonymous sub passed to then will run.  then requires that it returns a future, so it starts another piece of work to be returned, which we name $b.  Once $b finishes (for any reason), then $c finishes with that reason.

If $a originally failed or was canceled instead, then $c finishes with that reason, without ever starting up computation for $b.

The important point here is that then returns its own Future that becomes the main point of control for the entire operation.  Canceling $c would cancel either $a or $b if one of them were running at the time.  (It’s sequencing, so they can’t both be running simultaneously, but they could both be ready.)

As before, the code block as a whole returns $c, as the last result of the last expression (the value returned from the then method call.)

Equivalence


This means there are two sets of methods that trigger in the same situation, but one set contains the user methods, and one set contains the sequencing methods.

Event  : User method / Sequencing method
----------------------------------------
success: on_done     / then
failure: on_fail     / else
always:  on_ready    / followed_by

(For similarity to JavaScript Promises, a two-callback then exists as well; it takes a pair of callbacks for success and failure, respectively.)

In all cases, then(x), then(x,y), else(y), and followed_by(z), the callbacks given to sequencing methods must return Futures.  Callbacks given to user methods don’t have any special behavior.

How do I decide which type of callback to use?  If I need to kick off another Future and wait for its results, then it requires a sequencing method.  If I don’t want to return anything, then it must be a user method.

I guess, at some level, $f->then(sub { say "ok"; Future->done(@_); }) would be equivalent to $f->on_done(sub { say "ok"; })?  But I haven’t actually tested that.

Retrying


Before I understood sequencing methods, I wrote the original version of our memcache/dynamo proxy.  (This is a proxy that speaks memcached to clients, but actually uses an AWS DynamoDB table as data storage.  This allows us to store sessions in “memcached on localhost,” but share a common storage pool across web servers.  For reasons.)

One of the things we want to do in the proxy is back off and retry if a DynamoDB query fails for a retryable reason, such as “rate limit exceeded.” This took a lot of work to avoid building memory leaks and reference errors, and in the end, it’s mostly just repeat from Future::Utils (conveniently bundled with Future itself.)

The old code defines a retry and a _retry to accomplish it.  The outer retry takes a “task” and a timeout, builds an initial state hash for the inner _retry, and starts the whole thing running.  Outer retry returns a “king” future, which represents the result after any retry attempts are made.

Inner _retry calls the “task” (a callback which returns a future), and sets a handler to respond to the task future’s completion.  If it succeeded, the results are forwarded to the king future.  If it failed for a non-retryable reason, the failure is likewise forwarded, failing the king future. Otherwise, inner _retry gets called again with the updated state after a delay.

I’m pretty sure there’s a third named subroutine involved here somehow. Along with some gnarly closures and frequent use of ->on_ready(sub { undef $f }) to garbage-collect the task futures when they complete.

But all that can be replaced by this code structure:

use Future::Utils qw(repeat);
my ($king, $retrying);
$king = repeat {
    $retrying = 0;


    $task->()
    ->else_with_f(sub {
        if (! is_retryable()) {
            return shift;
        }

        $retrying = 1;
        $delay = Future->new;
        $mainloop->watch_time(
            after => 1,
            code => sub { $delay->done }
        );
        $delay;
    })
} while => sub { $retrying };
return $king;

This is simplified (no backoff, retryable() isn’t specified) but it looks like even at production quality, it will be a lot clearer than 3 functions knitted together.  And while we still call the top-level future “king,” we don’t have to build it, carry it around in state, and forward success/failures to it ourselves.

We reset $retrying to 0 each pass, so that by default, we don’t repeat. Successful results, or non-retryable failures, will be returned as-is.

When a retry is to be made, we build a future that will be done after that delay, and return it from else_with_f.  This means we do the task, do the else_with_f callback, do the delay, and then check whether we need to call repeat again.

The Documentation


It’s taken a bit of experimenting and bumping into walls for me to reach this point.  I couldn’t find much about Future itself on the internet, because a search for “perl future” is too generic, and “perl future cookbook” didn’t return anything useful, either.

This is because the author called it Future::Phrasebook instead.

I also checked the SEE ALSO section of the Future perldoc, but it doesn’t mention either the phrasebook or Future::Utils.  The Utils only rate an offhand mention at the end of Description, and the Phrasebook only appears in the Examples section (which I totally missed!  It’s a big page.)

Overall, the main Future documentation was overly terse and inscrutable until I understood the underlying concepts.  Now, it’s just dense.

Other Bits


IO::Async::Loop::Poll is still broken for SSL connections, as far as I know. IO::Async::Loop::EV frequently throws “can’t call method "code" on undefined” errors.  We plan on trying out IO::Async::Loop::Epoll.

We still have a ticket open to rewrite this thing in another language.  I was hoping to get some time to do it last summer.

No comments: