Saturday, June 7, 2014

Musing on Iterator Design

PHP’s standard library (not the standard set of extensions, but the thing known as the “SPL”) includes a set of iterators with a kind of curious structure.

At the root is Traversable, which is technically an empty interface, except that there’s some C-level magic to enable C-level things to declare themselves Traversable. This enables them to be used in foreach loops even though they don’t support any Iterator method.

Iterator exists as the basic interface for regular PHP code to implement iteration, internally. Notably, while Traversable is a once-only iterator, Iterator is not. rewind() is part of its interface.

For classes that wrap a collection, there’s an IteratorAggregate interface, which lets them return any Traversable to be used in their place. And who could forget ArrayObject, since a plain array (in spite of being iterable in the engine) is not actually a valid return from getIterator()?

There are some interesting compositions, like AppendIterator, which can take multiple Iterators and return each of their results in sequence. AppendIterator is a concrete class that implements OuterIterator, meaning that it not only provides regular iteration but the world can ask it for its inner iterator.

There’s the curiously named IteratorIterator, which I never appreciated until today, when I tried to pass a Traversable to AppendIterator. For no apparent reason, AppendIterator can only append Iterators, not Traversables. But IteratorIterator implements Iterator and can be constructed on Traversable, thus “upgrading” them.

Long ago discovered and just now remembered, PHP provides DirectoryIterator and RecursiveDirectoryIterator—but to make the latter actually recurse if used in a foreach loop, it needs to be wrapped in RecursiveIteratorIterator, because the caller of a plain RecursiveIterator must decide whether to recurse, or not, at an element with children.

The designer in me thinks this is all a bit weird. Because PHP defined the basic Iterator as rewind-able, a bunch of stuff that isn’t rewind-able (including IteratorAggregate) can’t be used in places that expect a base Iterator. Like AppendIterator. That in turn exposes a curious lack of lazy iterators. When I wrap an IteratorAggregate in an IteratorIterator, getIterator() is called right then, before the result can even be added to an AppendIterator.

So if I want to fake a SQL union operation by executing multiple database queries in sequence, and I don’t want to call the database before the AppendIterator reaches that point, there’s no simple way to do it with SPL’s built-in tools.

I can build a full Iterator that doesn’t query until current() is actually called, by implementing a handful of functions, each of which must carefully check the current iteration state before returning a value. These would satisfy the native AppendIterator, but be lazy themselves.

Or, I can build a full LazyAppendIterator that accepts Traversable and doesn’t try to resolve anything until the outer iteration actually reaches them. Then I can add IteratorAggregates that perform the query in getIterator(), since I won’t be calling that right away.

Those two choices amount to the exact same amount of work. In each case, I need to build a lazy iterator that tracks the “current iteration state” and launches the expensive call only when starting a new iterable/iteration sequence.

In other languages I’ve used, the basic one-pass/forward-only iterator is the type accepted by all the compositing types, and everyone must live with the idea that the iterators in use may not be seekable/replayable. In PHP, I have to live with the idea that the not all iterable things may work with all Iterators, which seems supremely odd.

P.S. They didn't pack any RegexFilterIterator in the box, either?

No comments: