Over the past couple of weeks, I have assembled a reader in PHP, such that it understands code of the form
(print (== (+ 4 4 6) (- 30 15 1)))
and will be able to create PHP source that ultimately prints out "1". It's kind of brokenly stupid in other ways, but it's the bare-bones skeleton of a working compiler. Something I have never been able to build prior to this attempt, largely because I wanted to tokenize something superficially like PHP, and I always got bored of defining all the stupid tokens. Going with s-expressions made for only a handful of token types so that I could get on with the interesting bits instead of grinding out pages of
/&&|\|\|/
crud. Because almost anything can go in an identifier, I can treat everything as identifiers for now.
There are a few obvious things it needs next: string types. Variables. defun. defmacro. Separate namespaces for functions and variables, defined by context, so you can say
(array_map htmlspecialchars row)
and it will know that the first argument passed is a callable and the second is a expression, so that they can compile to 'htmlspecialchars' and $row, respectively. And to serve its original purpose as an "enhanced PHP"-to-PHP compiler, it needs to read that source language rather than s-expressions. Of course, with a non-sexp-based language, macros might not work out so well, but I do want to be able to run code to rewrite the AST (or the whole tokenizer: aka reader macros) at compile-time.
There's a bunch of features I want to add, too. Proper named arguments. Multiple-value return. Ubiquitous lexical scope, so obviously
let
and its function equivalent (
flet
perhaps?). Something else that I'm forgetting at the moment.
In the long run, I also want to do some optimizations; ideally, I could turn
$efoo = array_map('htmlspecialchars', $foo);
into
$efoo=array(); foreach ($foo as $k=>$v) $efoo[$k]=htmlspecialchars($v);
as well as doing simple optimizations like
i++;
to
++i;
. I'd also love to be able to compile some 5.3 code like
$foo::bar("baz")
,
?:
, and "nowdoc" syntax into 5.2-compatible renditions (answer to the first:
call_user_func(array($foo, 'bar'), "baz")
though my accumulated wisdom now considers such things to be a code smell).
The weird thing about this is that if I succeed, I'll be doing what Rasmus did to create PHP—riffing on an existing system in the domain to come up with something a little better.