I’ve been toying around with Haskell lately, after I bought the excellent book Real World Haskell (there is also an online version where readers can comment on every paragraph). Every programmer should enjoy programming with a rigid but powerful type system like Haskell’s once in his or her career. It really puts things into perspective. Sure Ruby and Python are cool for quick hacking but nothing beats knowing that GHC accepts a program.
Indeed, I do catch most of my coding mistakes at compile time. Of course nothing prevents you from confusing (1-x) with (x-1) but Haskell really lets you focus on the actual code (once you get your program past the type checker).
But this post isn’t about me being a new fan of Haskell, but something much more insignificant: Yesterday I’ve built a tiny compiler. For a tiny language.
(bind.bind (x. x + 5) lambda.bind 2 calculus.lambda calculus)(x. f. f x)
I’ve taken the freedom to add integer literals and basic arithmetic since they can be expressed in lambda calculus anyway (via church numerals). If you want proof:
(bind.bind (f. x. x) zero.bind (f. x. f x) one.bind (f. x. f (f x)) two.bind (n. f. x. f (n f x)) succ.bind (n. m. m succ n) plus.bind (n. n (x. x + 1) 0) toLiteral.toLiteral (plus one (succ two)))(val. body. body val)
Just by using `one` and `succ` you can already express every natural number. Those church numerals can be mapped to any other number system by supplying the successor function and the zero element. In case of the built-in integers, that would be `x. x + 1` and `0` respectively.
And this example indeed results in “4” just as expected.
I would love to reproduce the compiled assembler code here, but it not pretty. The code consists of over 20 tiny functions, one for each `.` in the code above. And yes there is no shorthand for curried multi-parameter functions. This is an insignificant language after all.
Now why the effort?
Implementing Parvum was an interesting experience in Haskell. I got to use the parser combinator library `parsec` which embeds nicely into Haskell. Working with parsec really is great because you use one environment for the whole compiler. No separate grammar file and no machine generated code files. If I need a convenience function for my grammar (like a parameterised production) I just go ahead and create it.
Unlike all my previous parsers, this one really focuses on building an AST solely. No name lookups, no transformations, just an abstract syntax tree. This tree is then fed into the compiler, which translates it into a representation of the program not entirely unlike what Prexonite uses: There are applications which consist of global variables and functions which in turn have parameters, local variables, shared names (via closures) and byte code. This model of the application is then printed via the HughesPJ pretty printer library.
Apart from the highly ambiguous grammar of lambda calculus notation the biggest problem was to implement the translation step. Not because walking an AST in post-order is particularly difficult (Prexonite assembler is stack based) but because emitting embedded functions while generating code is not trivial.
- First of all, you need a steady supply of unique names: state monad
- You need to keep track of the current function environment (for shared names): reader monad and the local function
- Finally the embedded functions have to be stored in the application: writer monad
I’m not really happy with this solution as it is yet another example of how you can duck behind imperative concepts once things get a bit more complicated. In this particular case, the decision to design the application model (application contains functions and global variables etc.) like I did in C# is probably the culprint. In Prexonite, I build these data structures incrementally, adding and tweaking bits here and there. This just isn’t the way to go in Haskell. In Haskell you’re supposed find one final expression for every value, which in this case proved to be far from trivial.
Since Parvum is just a prototype I needed a high level back end which basically reduced the choice down to
- any actual programming language
In particular, the back end had to support closures out of the box. Out of these I chose Prexonite bytecode assembler because it is high level but still requires a translation effort (I actually want to learn something implementing Parvum) and because it resembles CIL to some extent.
The latter is important because I might implement my next serious language on top of the CLR directly instead of the DLR.
No, Parvum is not intended to replace Prexonite as my
workhorse language of choice. It really is just a research object.
I plan to investigate the following:
- Arity checking
- (1 + 5) :: *
- (x. x + 1) :: * –> *
- (x. y. x + y) 5 :: * –> *
- Extend language with 2nd data structure: Bool
- Allows for predicates (comparison, and, not etc.)
- Makes finite recursion possible
- Simple type checker (Int vs. Bool) in addition to arity checking
- My first static type checker *ever*
Unfortunately Parvum is strict, at least until I’ve found a way to encode lazy computations in the strict Prexonite VM. This is one place where I might cheat and just provide a `thunk` mechanism as part of Prexonite (say a command that wraps a computation and its arguments). We’ll see.