Sunday, June 5, 2011

Python's sum()

In Python, the sum() builtin gives you the ability to take a list, say [1, 2, 10] and find the sum of it as if you had written out 1 + 2 + 10.

The + operator is also defined for lists, where if you write out [1] + [2] + [10] you'll get a list back: [1, 2, 10]

What happens if we put these two observations together? Can we sum() a list of lists to get one flattened list?
Python 2.6.5 (r265:79063, Apr 16 2010, 13:09:56) 
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> print sum([[1],[2],[10]])
Traceback (most recent call last):
  File "<stdin>", line 1, in 
TypeError: unsupported operand type(s) for +: 'int' and 'list'
>>> 
Nope. sum() internally starts with "0 + (first element of sequence)" so you can only pass things that can be added to integers.

Friday, June 3, 2011

The First Step of a Long Journey

Over the past couple of weeks, I have assembled a reader in PHP, such that it understands code of the form (print (== (+ 4 4 6) (- 30 15 1))) and will be able to create PHP source that ultimately prints out "1".  It's kind of brokenly stupid in other ways, but it's the bare-bones skeleton of a working compiler.  Something I have never been able to build prior to this attempt, largely because I wanted to tokenize something superficially like PHP, and I always got bored of defining all the stupid tokens.  Going with s-expressions made for only a handful of token types so that I could get on with the interesting bits instead of grinding out pages of /&&|\|\|/ crud.  Because almost anything can go in an identifier, I can treat everything as identifiers for now.

There are a few obvious things it needs next: string types.  Variables.  defun.  defmacro.  Separate namespaces for functions and variables, defined by context, so you can say (array_map htmlspecialchars row) and it will know that the first argument passed is a callable and the second is a expression, so that they can compile to 'htmlspecialchars' and $row, respectively.  And to serve its original purpose as an "enhanced PHP"-to-PHP compiler, it needs to read that source language rather than s-expressions.  Of course, with a non-sexp-based language, macros might not work out so well, but I do want to be able to run code to rewrite the AST (or the whole tokenizer: aka reader macros) at compile-time.

There's a bunch of features I want to add, too.  Proper named arguments.  Multiple-value return.  Ubiquitous lexical scope, so obviously let and its function equivalent (flet perhaps?). Something else that I'm forgetting at the moment.

In the long run, I also want to do some optimizations; ideally, I could turn $efoo = array_map('htmlspecialchars', $foo); into $efoo=array(); foreach ($foo as $k=>$v) $efoo[$k]=htmlspecialchars($v); as well as doing simple optimizations like i++; to ++i;.  I'd also love to be able to compile some 5.3 code like $foo::bar("baz"), ?:, and "nowdoc" syntax into 5.2-compatible renditions (answer to the first: call_user_func(array($foo, 'bar'), "baz") though my accumulated wisdom now considers such things to be a code smell).

The weird thing about this is that if I succeed, I'll be doing what Rasmus did to create PHP—riffing on an existing system in the domain to come up with something a little better.