Friday, June 3, 2011

The First Step of a Long Journey

Over the past couple of weeks, I have assembled a reader in PHP, such that it understands code of the form (print (== (+ 4 4 6) (- 30 15 1))) and will be able to create PHP source that ultimately prints out "1".  It's kind of brokenly stupid in other ways, but it's the bare-bones skeleton of a working compiler.  Something I have never been able to build prior to this attempt, largely because I wanted to tokenize something superficially like PHP, and I always got bored of defining all the stupid tokens.  Going with s-expressions made for only a handful of token types so that I could get on with the interesting bits instead of grinding out pages of /&&|\|\|/ crud.  Because almost anything can go in an identifier, I can treat everything as identifiers for now.

There are a few obvious things it needs next: string types.  Variables.  defun.  defmacro.  Separate namespaces for functions and variables, defined by context, so you can say (array_map htmlspecialchars row) and it will know that the first argument passed is a callable and the second is a expression, so that they can compile to 'htmlspecialchars' and $row, respectively.  And to serve its original purpose as an "enhanced PHP"-to-PHP compiler, it needs to read that source language rather than s-expressions.  Of course, with a non-sexp-based language, macros might not work out so well, but I do want to be able to run code to rewrite the AST (or the whole tokenizer: aka reader macros) at compile-time.

There's a bunch of features I want to add, too.  Proper named arguments.  Multiple-value return.  Ubiquitous lexical scope, so obviously let and its function equivalent (flet perhaps?). Something else that I'm forgetting at the moment.

In the long run, I also want to do some optimizations; ideally, I could turn $efoo = array_map('htmlspecialchars', $foo); into $efoo=array(); foreach ($foo as $k=>$v) $efoo[$k]=htmlspecialchars($v); as well as doing simple optimizations like i++; to ++i;.  I'd also love to be able to compile some 5.3 code like $foo::bar("baz"), ?:, and "nowdoc" syntax into 5.2-compatible renditions (answer to the first: call_user_func(array($foo, 'bar'), "baz") though my accumulated wisdom now considers such things to be a code smell).

The weird thing about this is that if I succeed, I'll be doing what Rasmus did to create PHP—riffing on an existing system in the domain to come up with something a little better.

No comments: