Saturday, October 13, 2018

Idea: Type Propagation for Gradual Typing

Regarding this paper recently featured on Reddit, I got to thinking.

Perhaps it’s best to add type information starting in high-level modules; intuitively, having a low-level leaf function (especially one that is frequently called) checking and re-checking its type arguments on every call would certainly be slower than a higher-level function that gets called only a few times throughout the course of a run.

For instance, for a program that does data processing, added type checks in “once per file” functions would have less effect on the execution time than type checks in “once per line” functions.

But maybe we’re missing something, here.  The paper adds complete type information to one module at a time, but does nothing about inter-module calls at each step.  That is, a module may define that it accepts a string argument, but callers in other modules won’t be declaring that they are passing strings until that module has types added.



What if, once a function (or method, for object-oriented languages) were typed, that type information could be propagated to call sites?  It’s a sort of static analysis, where the analyzer would work backward from the provided types, and add/suggest types based on the given declarations.  For instance, once a function is specified to take strings and return strings, then callers could be updated to constrain the variables (and result) as string types.  If those strings are passed to other functions that don’t have their types declared, the analyzer would add the string type to their definitions.

Of course, it would have to stop if it could prove conflicting types (string passed via X, integer passed via Y.) It’s unclear in this case whether the programmers intended to accept multiple types, or if one of the call sites is in error.  This would have to be examined by hand.

If the union type is intended but the language doesn’t allow declaring it, then the gradual typing must be incomplete.  For example, PHP allows ?int but not int|false as a declaration, even though strpos() returns exactly the latter.  This might be fine, since I don’t think PHP uses type information to achieve a whole-program speedup the way TypedRacket can.

(All this means the concept of “union” types still exists in the language, independent of the type-declaration system’s ability to express them.  Another interesting case is, for instance, where Twig will accept string|array for specifying one or multiple search paths.  PHP’s use of int|false for the return type of strpos() is effectively returning Option<int> except that Option isn’t reified.  The programmer must know, out-of-band, to use === in order to distinguish 0 from false.)

Another potential problem is types in test code—in a dynamic language, part of the test suite may be intentionally passing bad types to verify that failures occur as expected.  Trying to infer types based on test code may cause unnecessary conflicts.  Type-propagation may work from the main code into tests, but be limited the other direction.  Or if the test is just about the types, then maybe those tests should be deleted once type checks are implemented into the main program.

Still, type-propagation in a gradually typed system seems like it would be interesting.  One easy target might be TypeScript, where type information is available for both core language functionality and popular libraries (lodash, React, etc.) Interacting with these libraries would give the propagation tool a lot of information to work with.

Perhaps we could even reach a level where an unexpected type conflict identifies a bug in the program design.  Ideally, we’d have a dynamic-like development experience, but end up with a fully-typed program.  (Of course, I don’t know if that would really work in practice.  My experience with static types has been that to change the program from A to B, you have to plan first, and then ignore IDE warnings for the time it takes to execute.)

No comments: