Prexonite 1.2.3 – Best Friends Forever

I was tempted to just skip this silly version number, but here we go: I humbly present to you

Prexonite v1.2.3

(No, I couldn’t resist :3 )

As always, Prexonite release 1.2.3 comes with

Sadly linux-compatibility didn’t make the cut, so Prexonite still only builds on windows. The binaries might or might not work with mono.

Stability

In spite of the silly version number, v1.2.3 is a very serious release. A lot of the work of the past couple of months went into making sure that you could depend on what I ship as part of Prexonite. I extended the automated test suite to cover the PSR, while also running existing test cases in multiple configurations.

I don’t really want to call it a long-term-support version, because unless there actually is a need for support (in the form of backported patches), it would be an empty phrase. The idea is more that, starting with the next version, I will be bold and start extending the language in a way that will change how you write Prexonite Script programs.

Breaking changes

This release of Prexonite Script comes with two breaking changes that affect a number of existing programs, both compiled and in script form.

var args fallback was removed

In Prexonite, all local variables are initialized to null, with the notable exception of var args, which holds the list of arguments passed to the function. However, in previous versions of Prexonite, there was a rule that allowed you to have a function parameter called “args” that would not get overwritten with the argument list. Instead the “magic variable” for that particular function would be “\args”. If there was a parameter called “\args”, the magic variable would be called “\\args” and so on.

This behavior is unintuitive to say the least. It also makes writing macros that access the argument list hard to write. So starting with this release, the var args fallback mechanism is no longer part of Prexonite. If you have a parameter called args, tough luck. This change affects the Prexonite execution engines, which means that both compiled programs and scripts are affected.

psr\ast.pxs SI is now implemented as a function

SI from psr\ast.pxs is a very handy shortcut for Prexonite compiler constants and utility functions. SI stands for “SymbolInterpretation”, but has grown to also provide call and return mode constants. In short, it appears very often in macro code. Previously it was implemented as a global variable that was initialized by some code in a build block. While this works perfectly for both the interactive interpreter and just running the script, it doesn’t support serialization (storing the in-memory representation of an application in a *.c.pxs file). While a serialized application can be loaded and executed just fine, it can’t be used to continue compilation, because the build-code that initialized the SI variable at compile-time is absent.

The solution is to implement SI as a function that lazily initializes the global variable. Due to the principle of information hiding underlying Prexonite Script’s function invocation syntax (parentheses are optional if the argument list is empty) this doesn’t pose a problem for existing script code. Existing compiled code doesn’t matter because of static linking. The one area where the change breaks code, is with meta macros (macros that write macro code). Code generated by macro code often refers to SI, because it is simpler to generate than accessing the corresponding static CLR fields (and more performant).

So: If your code refers to ast\gvar(“SI”), you’re in trouble. Now, you could just use ast\func(“SI”), but psr\macro.pxs comes with a better option: ast\getSI gives you a node referring to SI, whatever SI is in your distribution. Just Ctrl+F your code, searching for “SI” (with the quotes) and replace the ast\gvar expression with ast\getSI.

Improved round-tripping

Up to this point, storing a compiled application into a *.c.pxs file did alter that application in a significant way: In memory, not all functions and global variables are represented by a symbol. Inner functions for instance had no global name that you could refer to. When you store this application, all these “hidden” functions will be represented as top-level functions. When this file is loaded again, the compiler naturally can’t differentiate between functions that were “originally” top-level and “hidden” functions. True, the `ParentFunction` meta key could be consulted as a heuristic, but there are other examples of functions that have no global symbol (e.g. generated by a macro).

Starting with Prexonite v1.2.3, there is a new meta switch called ‘\sps’, which is short for “Suppress Primary Symbol”. It instructs the compiler, not to create a global symbol for the defined name of the function (the id that immediately follows the function keyword). Instead, all symbols are declared explicitly in the symbols section at the end of the file.

If you previously relied on all functions getting global symbols after round-tripping (storing-then-loading an application), you’ll be in trouble: The storage algorithms now add \sps to all functions (even originally top-level ones) and declare all global symbols explicitly.

Safety and coroutines

Prexonite comes with two execution engines: a bytecode interpreter and a CIL compiler. The former is used by default. You have to explicitly request CIL compilation by issuing the CompileToCil command at runtime (at this time, you cannot compile Prexonite byte code ahead of time). From that point on, the CIL implementations of functions is always the preferred way of executing a function. The interpreter only acts as a fallback mechanism. And there is a good reason for that besides performance: safety.

Exception handling in the bytecode interpreter is really just “best-effort”. After an exception occurs, I cannot guarantee that I will be able execute the corresponding catch or finally block. This is mostly due to the open nature of the interpreter API. It is possible for a stack context to remove itself from the stack, preventing the interpreter from applying the handlers in that stack frame to exceptions. Even worse, from the interpreters view, functions can just stop executing at any time for no apparent reason. For instance when used a coroutines.

While exception handling itself will usually work as advertised, finally-blocks in coroutines are fundamentally broken. They only work when an exception is thrown or when control gets transferred naturally (i.e., reaches the end of the try block). But especially for coroutines there is a very important third scenario: The sequence stops being evaluated at some point (code might just look at the first couple of elements and then abandon the sequence). If your sequence is holding onto an opened file, you’ll be in trouble.

Exceptions are a cool concept but they are also hard to implement correctly (at least in relation to any particular language feature). Coroutine support was easy to add to the interpreter for precisely the same reason it is not safe: The interpreter does not have tight control over the code it executes. The ability to abandon a computation at any point made coroutines almost trivial to implement, while making correct exception handling almost impossible.

For the nearer future this has the following consequences:

  • You should use the CIL execution engine whenever possible. Prexonite v1.2.3 still does not compile to CIL by default, however!
  • The interpreter will remain part of Prexonite, mostly as a “quick and dirty” way of executing code (build blocks, initialization code, REP-loops)
  • The syntactic support for coroutines will remain in Prexonite language. I might one day implement a coroutine transformation targeting CIL.
  • The compiler tries to detect and warn about usages of yield inside protected blocks. Yield statements outside protected blocks (i.e., in a naked for-loop) remain perfectly safe.
  • You can use the syntax `new enumerator(fMoveNext, fCurrent, fDispose)` to create native Prexonite sequences (no boxing of CLR objects involved). This can be more performant than coroutines, since the individual functions can be compiled to CIL.

Making the language more friendly

Prexonite Script has never been a simple language, but I have used this release to make some aspects of it a bit less verbose and more programmer-friendly:

Lists allow for trailing comma (‘,’)

Whenever you encounter a comma separated list in Prexonite Script, it is now legal to add an additional comma at the end. This even goes for argument lists, making `max(a,b,)` a perfectly valid invocation of the max function. Similar features can be found in many of the more pragmatic languages out in the wild. It makes manipulating lists that represent data (often found in embedded domain specific languages) much easier and simplifies Prexonite Script code generators (they no longer have to worry about not adding a comma after last entries).

More sane meta information blocks

In a similar move, meta entries in a meta block need no longer be terminated by a semicolon (‘;’); they are instead separated by semicolons while allowing for trailing semicolons. Existing code (where there always is a trailing semicolon) thus remains valid, while the often found single-entry meta blocks look less silly.

[is test;] vs. [test]

But wait! Where did the ‘is’ keyword go? Well that is now optional too. Essentially whenever you omit the value of a meta entry, it is interpreted as a boolean switch. This also works on the global level, but there the (terminating) semicolon remains mandatory.

Reliability

Up to this point, the scripts found in the PSR (please don’t ask what this acronym stands for) were never really considered a part of Prexonite/Prexonite Script. I just shipped them with every release, made them available to all scripts loaded via Prx.exe. They changed interface frequently and were often quite buggy.

There never was a need to. The PSR isn’t really a standard or runtime library. It’s just a collection of scripts that I found useful for day-to-day hacking in Prexonite Script. Prexonite itself doesn’t depend on the PSR in any way (this makes writing the command line tool a bit awkward at times).

Things have changed a bit with the inclusion of psr\macro.pxs, which is pretty much the only sane way to write macros in Prexonite Script and has de factor attained the status of a proper standard library. As a consequence I have started writing automated tests (of the unit/integration sort) for the most widely used PSR files.

I don’t have an automated testing framework for Prexonite script files yet, but you can use psr\test.pxs today to start implementing your own tests. In the spirit of xUnit it features a meta tag “test” and an “assert” function. Running all tests in an application is accomplished via “run_tests”. You can also specify additional tags that need to be present for a test to be included in the suite. By default, “run_tests” renders its results to standard output via what is called the “basic_ui”. You can provide your own “ui” (a structure that supports certain operations.) For more details, have a look at the implementation of basic_ui. You can of course use psr\struct.pxs to create the structure, I just didn’t want psr\test.pxs to depend on psr\struct.pxs and through that on psr\macro.pxs.

Through some arcane trickery (a bit of T4 and a Prexonite Script file) the automated test written using psr\test.pxs are executed as part of the NUnit test suite for Prexonite itself. One test case (a function) in Prexonite Script is mapped to a set of test cases in the NUnit world.

The way forward

The next release v1.2.4 will not be an ordinary version. After this rather uninteresting iteration, I will be a bit more bold in the next one, as hinted at in the first section. While staying largely backwards compatible, v1.2.4 will include a number of experimental features, that might be disabled by default. It will also break the Prx.exe command line interface. Look for more details in future posts.

Prexonite 1.2.2 – Power to the macro

TL;DR The new Prexonite release 1.2.2 is here!

In the nick of time…

When I announced my 3-month milestone cycle last time, I was sure that three months were waaay too long.Well my gut feeling hasn’t betrayed me. The 1.2.2 release is late by over a week. Why? Two reasons: I had solved all the interesting problems at the beginning of the iteration, towards the end, all that was left was grunt work (re-implementing all call\* commands to support partial application). So my free time got more and more taken over by more other interesting activities, like StarCraft 2, exploring C++ and a super secret compiler-related tool/library. I only managed to finish the milestone by cutting one high-profile feature: automatic partial application of macros, but more on that later.

The Prexonite compiler architecture is really starting to show its age (read bad less-than-ideal design). It is currently very easy to have the compiler generate invalid byte code (evaluation stack underflow and/or overflow) and there is nothing I can do to prevent it, that isn’t a giant, ugly hack. I’d really love to completely rewrite the entire compiler of Prexonite (starting with the AST and code generation). But this is not something for the near future.

Macro Commands

Up until now, macros were in a very bizarre situation. For the first time, Prexonite Script code offered a feature with no equal counterpart in the managed (C#) world. Instead of functions, there are commands; Prexonite structs are mirrored by plain CLR types with IObject; Compiler hooks and custom resolvers can both be implemented in either Prexonite Script or managed languages. But for macros, there was no equivalent – until now.

Let’s face it, the first implementation of Prexonite Script macros was a hack. I mean, injecting a fixed set of ‘magic variables’ into the function smelled right from the beginning. Obviously, the new managed macros and the interpreted ones should share as much architecture as possible, have the same capabilities and protections etc. Could you have imagined the method signature of the managed mechanism had I kept those 6 variables around? And the versioning hell?!

Ever wondered why it is common practice to pack all your .NET event arguments in an EventArgs class? Same reason, you want to be able to evolve the interface without breaking the event (changing its signature). Conversely, the macro system now operates with a single value being passed to the macro implementation (whether it is written in Prexonite Script or managed code).

That one object, the macro context, provides access to the information previously available through those 6 variables. Mostly. The new interface is much more restrictive. It doesn’t give you direct access to the loader or the compiler target. This is a first step in turning macros from a compiler feature into a language/library feature. Ultimately, macros should have their own AST/internal representation, to be completely independent of the compiler.

Partial Application for macros

When designing programming language features, one should always strive for orthogonality. If there is a sensible way in which two features could be used together, it should probably be possible to use them together. This is not always easy, of course, and sometimes the one reason why you should not include a certain language feature.

In Prexonite Script, compile-time macros and partial application are kind-of at odd with one another. Macros generate new code to be substituted, while partial application creates function-values without generating new functions. These two features are not easily combined: In general macros can expand to unique and completely unrelated code. Partially applying an arbitrary block of code is not possible (control flow into and out of the block cannot be captured by a function object, not in a transparent fashion).

Consequently, macro authors have to implement partial applications manually. Let’s see how you could implement a very simple logging macro, that also supports partial application. It should accept a .NET format string, followed by an arbitrary number of arguments. When expanded it should print the name and signature of the function from which it was called (expanded), along with the containing script file name, line and column, followed by the formatted message. It should obviously be possible to disable logging, thus the macro should expand to null when the meta switch “debugging” is not set.

Now this just happens to be one of those macros that can be partially applied. It only uses the macro expansion mechanism to gather additional information (function name, source file, line and column). It doesn’t generate return statements, or needs to expand other macros in the same context. This means that you can implement the bulk of your macro in an ordinary function that takes two additional arguments: the source position and a reference calling function. The macro thus only needs to gather this information and assemble a call to the implementing function.

function log\impl(pos, func, msg) 
{ 
    var finalMsg = call(msg.Format(?),var args >> skip(3)); 
    println(pos, ", ", func, ": ", finalMsg); 
}

Generating this code via the AST wouldn’t be very fun anyway. Now on to the interesting part, the actual log macro.

macro log(msg) [is partial\macro;]
{
    if(context.Application.Meta["debugging"].Switch)
    {
        var funcStr = ast\const(context.Function.ToString); 
        var posStr = call\macro([__POS__]);
        
        var impl = ast\func(->log\impl.Id);
        impl.Call = context.Call;    
        impl.Arguments.Add(posStr);
        impl.Arguments.Add(funcStr);
        impl.Arguments.AddRange(var args);
        
        context.Block.Expression = impl;
    }
    
    return true;
}

On line 3, we check whether the debugging switch is enabled and skip the entire code generation if it isn’t. If we don’t generate any code, the macro system will supply an expression that evaluates to null.

The variable funcStr is initialized to a string literal containing the function’s name and signature (line 5). Calling ToString on the function object is necessary, because the constant node constructor ast\const expects its argument to be either a boolean, an integer, a double or a string. No implicit conversion is performed in order to avoid ambiguity.

On line 6, the source code position (file, line, column) is handled. We use a shorthand form of call\macro, where we can specify the macro call we’d like to perform in a list literal and the implementation of call\macro (a macro command since this release) will translate it into a proper invocation of call\macro. That would look somewhat like this:

var posStr = call\macro(macro\reference(__POS__));

Having to type macro\reference every time you call a macro is no fun, especially not, if the callee-macro is actually statically known (which is quite often the case). You can still supply argument lists as additional arguments, just like with all other call/* commands.

On line 8, we start assembling the invocation of our implementation function. We could have just supplied the string "log\\impl" to the function call node constructor ast\func, but the proper way to do this, is to acquire a reference to that function and retrieve its Id. This way, if we change the physical name of the function (but retain the log\impl symbol, our macro will still work.

Copying the call type of our macro to the implementation function call on line 9 is very important to maintaining the illusion of log being an ordinary function. If you write log = "hello", you will expect the "value" of this expression to be "hello", with the side-effect of logging that message. If we don’t propagate the call type of the macro expansion to our function call expression, it will default to Get, which is not what the user expects.

Arguably, a macro should have no control over arguments outside of its argument list, but the way Prexonite understands the assignment operator, the right hand side is just another argument and is thus fair game as far as the macro system is concerned. It’s also not easy to detect a forgotten propagation of context.Call: Even though most macros that are Set-called will expand to Set-call expression, this is not a necessity for a correct implementation.

On lines 10 through 12, we add our arguments to the implementation function call: the source position, the function name as well as all arguments passed to the log macro. Don’t forget to actually assign the function call to the expanded code block. I can’t remember how many times, I’ve forgotten this. The result would be the macro-system-supplied null expression. So, if your macros ever produce null, double check if you have actually returned the generated AST nodes.

Almost done. Up until now, we just wrote a normal macro with the Prexonite 1.2.2 macro system. To make it partially applicable, we need two more details. Firstly, we need to mark our macro with the macro switch partial\macro. This signals that our macro can attempt to expand a partial application. Because there are some situations, where a macro cannot be partially applied (when the one thing that needs to be known at compile-time is missing), a macro needs to be able to reject expansion.

A partial macro (short for partially applicable macro) must return true, otherwise the macro system reports this particular expansion of the macro as "not partially applicable". Unlike normal macros, partial macros cannot return their generated AST, they must assign it to context.Block and/or context.Block.Expression. Also, the same macro code is used for both partial and total applications of the macro. You can use context.IsPartialApplication to distinguish the two cases.

However, if you build your macro carefully, you might not need to make that distinction. Our log macro doesn’t use this property at all, because the code we generate is the same in both scenarios. By building an ordinary invocation of log\impl, we don’t have to worry about partial application ourselves and can instead let the rest of the compiler worry about the details.

Calling Conventions

Prexonite now provides a whole family of very similar commands: call, call\member, code\async and call\macro, collectively known as call\*. They allow you to perform calls with a larger degree of control or different semantics. With the exception of call\async, these are also the "calling conventions" found in Prexonite. call uses the IIndirectCall interface (used by ref variables and the expr.(args) syntax), call\member is used for object member calls, and call\macro allows you to invoke other macros rather than expanding them.

Unfortunately, these call\* commands didn’t play nicely with partial application. They were partially applicable, but not only in a subset of useful cases. Take this usage of call\member for instance:

call\member(obj,?,[arg,?])

You might think that it create a partial application with two mapped arguments: the name of the member to invoke as well as the second argument. All other arguments to that partial application would be treated as argument lists. Prior to Prexonite 1.2.2, this was not the case, because partial application doesn’t look for placeholder in nested expressions. The list constructor would be partially applied and passed to call\member as an argument. And since partial applications are not argument lists, this will result in a exception at runtime.

Prexontie 1.2.2 contains a new calling convention, call\star which, well, handles calling call\*-family commands/functions. While that alone isn’t very exciting, it also scans its arguments for partially applied lists and adopts their placeholders. It enables you to do this:

call\star(call\member(?),[obj,?,arg,?])

This expression will yield a partial application (even though none of call\star‘s direct arguments is a placeholder), that does what we expected our previous example to do. Of course, this is kind of unwieldy, which is why all other call\* commands have been rewritten as macros to take advantage of call\star. This means that in Prexonite 1.2.2, our example from above will actually work the way we expected it to work.

Automatic partial application compatibility

One of the features I had planned for this release, was a mechanism for automatically making some macros partially applicable. The user would have to announce (via a meta switch) that their macro conformed to certain constraints and the macro system would have automatically made it partially applicable. My idea was to take the code generated by the partially applied macro and put "extra-line" it into a separate function, sharing variables where necessary.

Unfortunately, the reality is somewhat more complicated. This kind of transplantation cannot be done in a completely transparent way. I’m also not sure, if the Prexonite compiler is equipped to perform the necessary static analysis. (It would certainly be possible, but at the cost of maintainability; a hack in other words) Finally, this is not really what partial application stands for. The promise of not generating new functions for every call site would be broken.

I had a number of ideas for dealing with this issue: Comparing functions with existing ones, trusting the user to generate the same code and re-using the "extra-lined" function after the first macro expansion. You have seen how straightforward partial macros are, when you follow the pattern of implementing the bulk of code in an implementation function. At this point, automatic partial application compatibility doesn’t seem worth pursuing. There are more important issues at hand, such as improving code re-use in macros.

The next milestone: Prexonite 1.2.3

The next milestone will focus on usability, compatibility and reliability. All very important topics (if a bit dull). I would like Prexonite to

  • compile, run and test Prexonite on Ubuntu via mono 2.8+
  • Support Unix #! script directives
  • have a unit-testing "framework" for Prexonite
  • provide automated tests for the Prexonite standard repository (psr)
  • Integrate those tests into the automated testing of Prexonite
  • Make the language a little bit more friendly
  • Automated testing of round-tripping (compile&load should be equivalent to interpreting directly)

While this doesn’t seem like a lot of new features for Prexonite itself, it will actually involve quite a lot of work. I don’t know, for instance, how I will address the issue of directory separator characters in build does require("psr\\macro.pxs"); and similar statements. Also, for #!-compatibility, I might have to completely change the command-line interface of Prx.exe. We’ll see how far I get. The fact that Ubuntu still doesn’t ship with a .NET 4.0 compatible version of mono doesn’t help.

After that?   That depends. I have some neat ideas for allowing scripts to specify minimum and maximum compiler- and runtime-versions. The macro system still needs some love, reusing code is difficult to say the least. Block expressions are a possibility, though in its current form, the compiler can’t handle those. Eventually I want to overcome this limitation, but that requires a rewrite of the entire compiler (at least large parts of it).

Prexonite 1.2.1 – Partial Application and Mercurial

Release Policy

I haven’t released a "named" version of Prexonite in a long time. It just wasn’t worth the effort. I’m giving explicit version numbers another shot, starting with this release.

Versioning Scheme

I’m looking at a very simple versioning scheme:

lang.major.minor[.patch]

Four decimal numbers, separated by dots. The first number (lang) stands for the version of the language. This number will probably stay the same as long as I don’t rewrite the entire implementation from scratch, including breaking changes to a signification fraction of existing programs written in Prexonite Script or targetting the Prexonite VM. This would be something like the jump from PHP4 to PHP5.

The second number (major) will change infrequently and only as a consequence of profound changes – in the language or in the underlying implementation. The removal of built-in operators from the VM was one such change. But more on that later.

The third number (minor) is where most of the action is. Every increment in the minor version will correspond to the completion of a milestone, which I’ll talk about shortly. Patch versions, finally, are intended for backports from the current development version.

Milestones

I would like to get more consistent releases of Prexonite, not just a bunch of development snapshots that you have to correlate to blog posts and commit logs in order to find out what exactly is new.

In order to achieve this, I will define milestones that are achievable in no more than three months. At the end each milestone, I will have a version of Prexonite that is ready to be used "in the wild". Additionally, each milestone will be associated with a "theme". I will try to focus my work on Prexonite to one area at a time. The 1.2.1 milestone for instance was focused on partial application.

Mercurial

For 1.2.1 I switched my version control system from Subversion over to Mercurial, still hosted as an open source project at assembla.com. Subversion is great but for me there are some benefits in using Mercurial.

Easier than Subversion

I personally find Mercurial easier to use than Subversion, even though there are more concepts involved (local repository versus working copy/pull versus update). The main advantage of Mercurial is that you can clone the main repository, experiment around, create branches, create tags, attempt complicated merges and when it doesn’t work out, you can just throw away your local repo and make a new clone (or evacuate the successful commits by cloning locally up the last "good" revision). In Subversion, all the action happens on the central server. If you make a mistake it’s logged for all eternity.

Local commits

The second big advantage of Mercurial is being able to commit your changes locally. Sure, every commit should be a working system, but that’s more of an ideal than a realistic scenario. I want to be able to save my work in a state that is at least semi-working before I attempt a risky rewrite. I also don’t want to work without revision control wen I’m on the road. Sure there is 3G, but with Mercurial and local commits my laptop battery lasts longer.

Partial Application

One to the big new feature in this and some past releases. Partial application is a new feature of Prexonite Script that allows you to define function values for very simple expressions in a succinct and, hopefully, intuitive way.

f(1, ?) // as a lambda: x => f(1, x)

You can turn almost any simple expression into a partial application by denoting the open arguments as question marks (?). The value of such a partially applied expression will be a function-like value. When the partial application is applied, the questions marks will be substituted with the provided arguments and the expression will be evaluated. Any operand of the expression, that is not a question mark will only be evaluated once at the point where the partial application is created.

Many expressions support partial application:

new List(?), 
x.Append(?), 
?.ToString(), 
System::Console.WriteLine(?),
? is List, 
?~Real

Including all unary and binary operators

1 + ?, 
?*2, 
x & ?, 
y and ?

You can also define multiple open parameters

f(?, 1, ?), 
? == ?

Or map the supplied arguments in arbitrary orders or multiple times

f(?1, 2, ?0, ?0)

Limited to simple Expressions

A key concept of partial application, is that the placeholders can only as direct operands to the operation/function to be partially applied.

f(1+?,g(16*3)+2)

Will not produce a partial application of f, but pass the partial application (1 + ?) to f as an argument. You either have to resort to a full blown lambda expression for this

x => f(1+x,g(16*3)+2)

Or use function composition (also newly introduced)

? + 1 then f(?, g(16*3)+2)

The `then` keyword combines to functions into one:

g then f //means about the same as `x => f(g(x))`

Note how all non-placeholder operands of partial applications can be arbitrarily complex.

Differences between lambda expressions and partial applications

Lambda expressions and partial applications are in most cases not equivalent. The key difference is, that all non-placeholder arguments, the so-called "closed arguments", are evaluated when the partial application object is created, not when the resulting object is called — as is the case with lambda expressions. In the example above, it might therefore be desirable to resort to the more difficult to understand function composition syntax (using `then`) in order to avoid re-computing `g(16*3)+2` over and over again.

Partial applications are sadly not as fast as lambda expressions. Invoking a partial application involves the copying the closed arguments and the freshly supplied actual arguments into an effective arguments array, a step that lambdas don’t have to perform. The effect isn’t that dramatic, though. Partial application is measurably slower, yes, but within the same range.

The creation of partial applications, on the other hand, is a different story. Whereas for a lambda expression, the runtime just has to capture references to all shared variables, in the case of partial applications, a mapping consisting of entries for both placeholders and the closed arguments has to be computed. The creation of a partial application is an order of magnitude slower than the creation of a closure (the lambda expression function object). So if you care deeply about performance, don’t create partial applications in tight inner loops.

Other Changes

Macro references

The illusion that macros can be used just like any function is pretty good, but not perfect. It breaks down miserably, when you try to take the reference of a macro or attempt to create a partial application thereof. For this release, the compiler will detect and reject any such attempts. Macro writer can make use of the `macro\reference` macro to work around this limitation.

For most macros, taking a reference or creating a partial application, doesn’t really make sense anyway. But there are some exceptions. I hope that the next release will address these scenarios, and allow macro writers to make their macros compatible with partial application and reference-taking.

Single Quotes

This recent release has also seen the addition of the single quote (') as valid character for identifiers (except at the beginning) and as a separator for number literals.

function bob(cat)
{
  var bob's_cat = cat + "of bob";
  return bob's_cat + " weighs " + 100'000 + "kg";
}

Single quotes in number literals are accepted, but otherwise ignored. They’re thrown away right after lexical analysis, so not even macros can find out if there ever were single quotes.

In identifiers, single quotes have no special meaning, they’re just characters and you can use them in any way you like. As long as the identifier doesn’t start with a single quote, you’re fine.

Implementation of operators

Except for the lazy logical ones, all operators in Prexonite are now implemented as commands and no longer part of the various virtual machines. This doesn’t just simplify the latter, it also makes partial application of those operators possible. As a user of Prexonite Script, you won’t notice the change at all, even if you have assembler code, that uses the operator instructions (these are transparently translated into appropriate command calls).

If you want, you can redefine the built-in operators. Whenever you write `1 + 2` it is compiled as `(+)(1,2)`. If your function `my_plus` is declared as `(+)`, it will be called instead. You probably don’t want to do that, but it’s possible. (And yes, (+) is a proper identifier. You can even use it in meta data sections)

CIL Extensions

A requirement to implementing partial application efficiently in CIL, but without adding new VM instructions, was the addition of CIL Extensions, a new API that allows commands to gain access to compile-time-constant arguments during CIL code generation. This essentially enables variable arguments instructions. So far, partial application is the only feature that makes use of this.

v.Next

The next planned release is going to be 1.2.2, focusing on generalising the macro system. The most important feature is making the macro system also available to commands, and thus to managed code.

As this will involve a partial rewrite of a substantial portion of the macro system, it could take a while. Hopefully, it’ll also support the partial application of macros and a cleaner, more isolated interface for macros.

Project Announcement: TinCal

I’d like to announce a project of mine that has been in planning since the end of January 2010: „TinCal“. It’s a project I’ve been working on and off (mostly off) for the past ten months, besides school, Prexonite and another secret project. At this point TinCal can really become anything from vapor ware to my next „successful“ (i.e. finished) programming language (and maybe bachelor’s thesis). So yes, it is a programming language and thus the successor project to Prexonite, and as before, I’m aiming much higher now. TinCal is definitely going to be statically typed and compiling directly to Common Intermediate Language (.NET byte code). But I don’t plan to stop there: True to my love for functional programming in general and Haskell in particular, TinCal is going to be a language extremely similar to the most successful lazy programming language to date.

(more…)

Prexonite Macro Support

Over a year ago I considered the implementation of meta programming facilities in Prexonite Script. In that post, I concluded that advanced meta-programming features like  quotation and splicing were not realistic for a compiler architecture as convoluted as Prexonite.

I, however, think that the implementation of macro functions is feasible and useful. Macros won’t revolutionise how the language is used, but will complement the existing dynamic features to enable concise solutions, should the programmer decide to invest enough time into the macro system.

And that’s just what I did. Prexonite now supports a new keyword `macro` that can be used to define macro functions. Macro functions are invoked at compile time (more precisely just before code generation) with the AST nodes of their arguments passed as parameters. They’re expected to return a single AST node that is then inserted into the final AST in place of the macro function call.

The mechanism is already in use: The implementation of the `struct` mechanism has been ported to the macro system.

For those unfamiliar with `struct`:
Prexonite Script does not provide any mechanism for users to define their own data types (classes, structures, records). Instead there is a special type of object that you can extend with new fields and methods: The `Structure`. However, using this object to create custom structures/objects is quite verbose and cumbersome. Fortunately there is a PSR file `psr\struct.pxs` that contains the handy function of the same name. `struct` creates an empty structure and adds all nested functions of the calling function to it. This comes quite close to JavaScript-prototype-definition level of verbosity.

build does require(@"psrstruct.pxs");
function create_foo(x){
  function report() {
    println(x);
  }
  function increment(this) {
    x++;
    this.report();
    return x;
  }
  return struct;
}

function main(){
  var foo1 = new foo(5); 
  //fancy syntax for `create_foo(5)`var foo2 = create_foo(foo1.increment()); 
  //prints 6foo2.report; 
  //prints 6 too
}

The function `create_foo` (you can call it a constructor if you like) creates a new `foo` object every time it is invoked. The resulting object has two methods `report` and `increment`. The variable `x` is shared by all methods via the nested function/closure mechanism. It is not formally part of the foo object.

Ok, now back to macro functions. The implementation of the struct function is actually quite simple. It makes heavy use of helper functions defined in `psr\macro.pxs`. All of the `ast<something>` functions plus `tempalloc` and `tempfree` are defined in that file.

//part of `macro struct()` in `psr\struct.pxs`
//creates a new block expression node and allocates a local
//  variable for the structure object.
var block = ast("BlockExpression");

var structV = tempalloc;

//create the structure object and assign it to
//  the local variable
var assignStructure = astlvar(SI.set, structV);
assignStructure.Arguments.Add(ast\new("Structure"));
block.Add(assignStructure);
//don't forget to add the statement to the block
//assign the "ctorId" (name of the constructor function) to the structure
var assignCtorId = astmember(astlvar(SI.get, structV), SI.get,@"");
assignCtorId.Arguments.Add(ast\const(CTORID));
assignCtorId.Arguments.Add(ast\const(parentId));
block.Add(assignCtorId);
//Add all of the methods to the struct
foreach(var method in methods){
    var addMethod = ast\member(ast\lvar(SI.get, structV), SI.get, "\\");
    addMethod.Arguments.Add(ast\const(method.Key));
    addMethod.Arguments.Add(ast\lvar(SI.get, method.Value));
    block.Add(addMethod);
}
//return the constructed structure 
objectblock.Expression = astlvar(SI.get, structV);
//Don't forget to free our temporary variable and actually return the code 
blocktempfree(structV);
return block;

The macro code is not that difficult to understand. So writing macros in Prexonite Script is easy right? Not exactly, no. You’re directly operating within the compilers AST, an API that was never designed to be consumed by user code. That `ast(”BlockExpression”)` creates an object of type `Prexonite.Compiler.Ast.AstBlockExpression` and the `Arguments` member of the `assignCtorId` node is the same member that the compiler uses. Why is that a bad thing? Well for one it creates a very strong dependency on an implementation detail of the Prexonite Script compiler, and it relies heavily on good IDEs, that means the API is ugly (often many parameters) and irregular (I myself have to constantly lookup the ever changing parameter ordering and exact member names).
In other words: You won’t get far without a copy of the compiler source code next to your Prexonite Script code editor.

Ok, so its a bit tricky to use, anything else I should know?

Yes.

  • macro functions can only be used after they have been defined. It is not possible to forward-declare a macro function.
  • macro functions cannot be nested. They have to appear on the global level.
  • macros used as part of macro functions behave like in any other function. If you want to call another macro (for code reuse), you’ll have to use the `call\macro` function (defined in `psr\macro.pxs`). It behaves just like the `call` command.
  • A macro function has access to a number of implicitly defined variables:
    • loader, a reference to the Prexonite.Compiler.Loader instance
    • callType, the call type of this invocation (get vs. set), see Prexonite.Types.PCall
    • justEffect, a boolean indicating whether the caller just wants the side effects. (return values can safely be dropped or not computed at all)
    • locals, access to the collection of local variable definitions (not PVariable objects! We’re at compile-time)
    • macroInvocation, a reference to the AST node for this macro invocation
    • target, a reference to the current compiler target, Prexonite.Compiler.CompilerTarget; provides access to the calling function, amongst other things
  • macros are applied from the outside to the inside, that is, the outermost macro is applied first. This means that your macro might actually disappear from the AST and never be applied.
  • macro functions are not automatically stripped from the application after compilation (simply because Prexonite has no concept of “after compilation).
    You can use the function `unload_compiler` defined in `psr\ast.pxs` to remove all macros and functions/variables marked with the `compiler`tag (including any nested functions)
  • Global variables are only initialized at runtime, not at compile-time, and they don’t exist until they’re defined (and not just declared)
  • You don’t have access to local symbols (declarations) in macro functions.
    • The symbol table provided by `target.Symbols` reflects the state at the end of the end of the calling function. Don’t ask.
    • Any symbols you add to the symbol table won’t be available to the calling function. But if you add them to `loader.Symbols` they will be available to functions defined after the calling function. This might be interesting for initialization code.

Multithreaded Prexonite

Yesterday, I watched the Google TechTalk presentation of the programming language Go (see golang.org). Even though I don’t believe Go will become very popular, there was one aspect of the language that “inspired” me. One of Go’s major advantages is the fact that concurrency (and coordination thereof) is built deep into the language. Every function has the potential to become what they call a “goroutine” that works in parallel to all the other code. Communication and coordination is achieved through an equally simple system of synchronous channels (a buffer of size 0).

The code samples all looked so incredibly simple to me that I thought: I can do that too. I started by writing a case study (a code sample) and a sketched implementation of the synchronization mechanisms. It all looked nice on paper (in the Prexonite Script editor) but there was one major problem: None of the Prexonite execution engines is really capable of running on multiple threads. Even though the CIL compiled functions behave nicely, they don’t cover the whole language. Coroutines and other dynamic features are just too important to be ignored.

Fortunately the design of the interpreter is centered around one abstract class: the StackContext. StackContexts come in a variety of forms and provide complete access to the current state of the interpeter. For the most part, these objects are passed around in the call hierarchy. And that’s good, because functions/methods that call each other are in the same thread. So stack contexts stay thread local. At least the ones obtained via method arguments. But there is also another way to get your hands onto stack contexts and that’s the interpreters stack itself.

Of course a multithreaded Prexonite will have multiple stacks but just providing those isn’t enough. I had to ensure that direct queries to/manipulation of the stack are only directed at the stack of the currently executing thread. Since stack contexts are sometimes “captured” by other objects (coroutines for instance), they can on occasion slip into other threads and wreak havoc with the stack of their original thread.

The only solution I saw, was to use thread local storage via unnamed data slots. This has disadvantages, of course:

  • data slots are rather slow (not Reflection slow, but also not virtual call fast)
  • it is difficult to manipulate a different stack if you want to

but

  • it works!
//name~String, source~Channel
function work(name, source)
{
    println("\tWhat did $name say again?");
    var a = source.receive;
    println("\t$name said \"$(a)\"");
    return name: a;
}

function main()
{
    var source = chan; //Creates a new channel
    println("Get to Work!");
    //`call\async` invokes `work` in the backgrouund
    //`resp` is a channel that will receive the return value of `work`
    var resp = call\async(->work,["Chris",source ]);
    
    println("Oh, I forgot to tell you this: Chris said Hello!");
    source .send("Hello!"); //supply the missing value
    
    println("I'm waiting for you to finish...");
    var res = resp.receive;
    println("So your answer is $res then...");
    
    return 0;
}

results (sometimes) in

Get to Work!
        What did Chris say again?
Oh, I forgot to tell you this: Chris said Hello!
I'm waiting for you to finish...
        Chris said "Hello!"
So your answer is Chris: Hello! then...

Lists, coroutines and all other kinds of sequences appear very often in Prexonite. It is only logical to combine multithreading with the IEnumerable interface. The result is the `async_seq` command that takes an existing sequence and pushes its computation into a background thread.

The following example applies the two fantasy-functions `sumtorial` and `divtorial` to the numbers 1 through 100. The ordinary `seq` version uses can only use one processor core, while `async_seq` pushes the computation of `sumtorial` into the background and onto the second core. Even though the management overhead is gigantic, a performance improvement can be measured (a modest factor of ~X1.3).

function sumtorial(n) =
    if(n <= 1) 1
    else       n+sumtorial(n-1);
function divtorial(n) =
    if(n <= 1) 1
    else       n/divtorial(n-n/38-1);

function main()
{    
    var n = 100;
            
    var seq = 
        1.To(n) 
        >> map(->sumtorial)
        >> map(->divtorial);
   
    println("Sequential");
    println("\t",seq >> sum);

    
    var par = 
        1.To(n) 
        >> map(->sumtorial)
        >> async_seq
        >> map(->divtorial);
   
    println("Parallel");
    println("\t",par >> sum);
}

Multithreaded Prexonite currently lives in its own SVN branch as I am still experimenting with concrete implementations. A very nice improvement would be the move away from CLR threads as they don’t scale well. Communicating sequential processes spend a lot of time waiting for messages, something that does not require its own thread. Go can use 100’000 goroutines and more. I don’t even want to try this with CLR threads…

Lazy factorial

This post refers to Parvum, a research compiler/language of mine presented in the previous Post. Its a compiler written in Haskell, that translates a “lambda calculus”-like language to Prexonite byte code assembler.

Did you know that a tiny modification to Parvum makes the language (most likely) turing-complete? By adopting non-strict (lazy) evaluation, one can implement all sorts of control structures in nothing but Parvum, even without language support for boolean values and conditions.

Proof? Yes, please! Here you see the implementation of the factorial function in Parvum (currently hardwired to compute the factorial of 8):

(\bind.
  bind (\n. n (\x. x + 1) 0) \toInt.
  bind (\f x. x) \zero.
  bind (\f x. f x) \one.
  bind (\f x. f (f (f (f x)))) \four.
  bind (\n f x. f (n f x)) \succ.
  bind (\p a b. p a b) \ifthenelse.
  bind (\x y. x) \true.
  bind zero \false.
  bind (\n. n (\x. false) true) \isZero.
  bind (\n m. m succ n) \plus.
  bind (\n. n (\g k. isZero (g one) k (plus (g k) one) ) (\v. zero) zero) \pred.
  bind (\m n f. m (n f)) \mul .
  bind (\self n. 
    ifthenelse (isZero n)
      (one)
      (mul n (self self (pred n)))
    ) \facrec.
  bind (\n. facrec facrec n) \fac.

  toInt (fac (succ (succ (succ (succ (succ four))))))
)(\var body. body var)

76 seconds and a peak working set of over 650 MiB later, the number 362880 appears on the console. It works indeed.

I spare you the compiled byte code as it consists of 47 functions. Note that Prexonite integer arithmetic is strictly only used in the lambda expression bound to `toInt`, which is applied after the factorial has been computed.

The actual computation is all done in church numerals, so yes, the number 362880 is indeed a function `c` that applies a supplied function `f` 362880 times. The necessary control structures (`ifthenelse`) are entirely implemented using functions too.

Recursion was a bit tricky as Parvum does not (yet) have let-bindings. As you can see in the code, I solved this by passing `facrec` a reference to itself. It does look a bit strange, especially the `self self` bit, but it works.

The price paid

I admit, I cheated (a little): The laziness mechanisms are actually implemented in C# as part of Prexonite (live in the SVN trunk). There is a new command called `thunk` which takes an expression (something callable, a lambda expression for instance) and an arbitrary number of parameters for that expression (optional). The return value is, surprise, a `Thunk` object. This object really has only one member: `force`

`force`, well, forces the evaluation of the supplied expression until some concrete value is obtained. That value, can of course contain further thunks. A lazy linked list for instance:

function _consT hT tT = [hT,tT];
function _headT xsT = xsT.force[0];
function _tailT xsT = xsT.force[1];
function _refT xT = xT.force.();

function repeatT(x)
{
  var xsT;
  var xsT = thunk(->_consT, x, thunk(->_refT, ->xsT));
  return xsT;
}

//Equivalent in Haskell:
//repeat x = let xs = x:xs in xs

Identifiers ending in `T` represent thunks (or functions that take and return thunks) whereas a prepended `_` identifies what I call "primitve expressions". They are the building blocks for more complex expressions and most are equivalents of actual assembler instructions: `_refT` for instance is the primitive for `indarg.0`, the instruction responsible for dereferencing variable references (among other things).

Justification

In the last post I justified the decision to compile to Prexonite byte code assembler with the lack of challenge when compiling into an actual high level language like Neko or JavaScript (or Haskell).

Why is it that I suddenly fear challenge? First of all, the laziness implementation in effect right now could also have been implemented in Prexonite script (in fact it has already been implemented: `create_lazy` in psr\misc.pxs is an example. (I will probably remove this structure now that I have a fully managed implementation. )

Actually, `thunk` was more difficult to write in C# than it would have been in Prexonite Script. As I mentioned, the factorial program used over 650 MiB of RAM and even though all Prexonite objects are stored on the Heap, the stack frame overhead is gigantic. I had to add a new mechanism to the FunctionContext (the actual byte code interpreter) to allow the injection of managed into the Prexonite stack.

That mechanism (the co-operative managed context) is similar to managed coroutines but behaves more like a function. Also managed code that was initially not intended to be run co-operatively (i.e. invoked as a member or as an indirect call) can “promote” itself onto the Prexonite stack.

Of course the Prexonite stack is not exactly known for its excellent performance, but unlike the managed stack, it is only limited by the amount of memory available.

A truly insignificant post

I’ve been toying around with Haskell lately, after I bought the excellent book Real World Haskell (there is also an online version where readers can comment on every paragraph). Every programmer should enjoy programming with a rigid but powerful type system like Haskell’s once in his or her career. It really puts things into perspective. Sure Ruby and Python are cool for quick hacking but nothing beats knowing that GHC accepts a program.

Indeed, I do catch most of my coding mistakes at compile time. Of course nothing prevents you from confusing (1-x) with (x-1) but Haskell really lets you focus on the actual code (once you get your program past the type checker).

But this post isn’t about me being a new fan of Haskell, but something much more insignificant: Yesterday I’ve built a tiny compiler. For a tiny language.

Parvum

Its called Parvum (which is Latin for, well, insignificant) and translates expressions in a “lambda calculus”-like language to a corresponding program in Prexonite bytecode assembler:

(bind.bind (x. x + 5) lambda.bind 2 calculus.lambda calculus)(x. f. f  x)

I’ve taken the freedom to add integer literals and basic arithmetic since they can be expressed in lambda calculus anyway (via church numerals). If you want proof:

(bind.bind (f. x. x) zero.bind (f. x. f x) one.bind (f. x. f (f x)) two.bind (n. f. x. f (n f x)) succ.bind (n. m. m succ n) plus.bind (n. n (x. x + 1) 0) toLiteral.toLiteral (plus one (succ two)))(val. body. body val)

Just by using `one` and `succ` you can already express every natural number. Those church numerals can be mapped to any other number system by supplying the successor function and the zero element. In case of the built-in integers, that would be `x. x + 1` and `0` respectively.

And this example indeed results in “4” just as expected.

I would love to reproduce the compiled assembler code here, but it not pretty. The code consists of over 20 tiny functions, one for each `.` in the code above. And yes there is no shorthand for curried multi-parameter functions. This is an insignificant language after all.

Now why the effort?

Implementing Parvum was an interesting experience in Haskell. I got to use the parser combinator library `parsec` which embeds nicely into Haskell. Working with parsec really is great because you use one environment for the whole compiler. No separate grammar file and no machine generated code files. If I need a convenience function for my grammar (like a parameterised production) I just go ahead and create it.

Unlike all my previous parsers, this one really focuses on building an AST solely. No name lookups, no transformations, just an abstract syntax tree. This tree is then fed into the compiler, which translates it into a representation of the program not entirely unlike what Prexonite uses: There are applications which consist of global variables and functions which in turn have parameters, local variables, shared names (via closures) and byte code. This model of the application is then printed via the HughesPJ pretty printer library.

Apart from the highly ambiguous grammar of lambda calculus notation the biggest problem was to implement the translation step. Not because walking an AST in post-order  is particularly difficult (Prexonite assembler is stack based) but because emitting embedded functions while generating code is not trivial.

  • First of all, you need a steady supply of unique names: state monad
  • You need to keep track of the current function environment (for shared names): reader monad and the local function
  • Finally the embedded functions have to be stored in the application: writer monad

which results in quite a monster of a monad: ReaderT PxsFunc (WriterT [PxsFunc] (State Int))

I’m not really happy with this solution as it is yet another example of how you can duck behind imperative concepts once things get a bit more complicated. In this particular case, the decision to design the application model (application contains functions and global variables etc.) like I did in C# is probably the culprint. In Prexonite, I build these data structures incrementally, adding and tweaking bits here and there. This just isn’t the way to go in Haskell. In Haskell you’re supposed find one final expression for every value, which in this case proved to be far from trivial.

Why Prexonite?

By now there are really a lot of options for compiler back ends and/or platforms. Ranging from the Java VM, the CLR, Neko and LLVM to old C. Why would Prexonite be just the right choice?

Since Parvum is just a prototype I needed a high level back end which basically reduced the choice down to

  • Neko
  • Prexonite
  • any actual programming language

In particular, the back end had to support closures out of the box. Out of these I chose Prexonite bytecode assembler because it is high level but still requires a translation effort (I actually want to learn something implementing Parvum) and because it resembles CIL to some extent.
The latter is important because I might implement my next serious language on top of the CLR directly instead of the DLR.

The future

No, Parvum is not intended to replace Prexonite as my workhorse  language of choice. It really is just a research object.

I plan to investigate the following:

  1. Arity checking
    • (1 + 5) :: *
    • (x. x + 1) :: * –> *
    • (x. y. x + y) 5 :: * –> *
    • etc.
  2. Extend language with 2nd data structure: Bool
    • Allows for predicates (comparison, and, not etc.)
    • Conditions
    • Makes finite recursion possible
  3. Simple type checker (Int vs. Bool) in addition to arity checking
    • My first static type checker *ever*

Unfortunately Parvum is strict, at least until I’ve found a way to encode lazy computations in the strict Prexonite VM. This is one place where I might cheat and just provide a `thunk` mechanism as part of Prexonite (say a command that wraps a computation and its arguments). We’ll see.

Meta Programming in Prexonite Script?

The idea of implementing a macro system in Prexonite Script was inspired by reading an article on lambda-the-ultimate.org about Converge. I realized that with a powerful API like that of the Prexonite compiler, it might be possible to implement similar features.

The basic idea would be to have a new kind of function, the macro function, which is executed at compile time whenever it is encountered in code. Instead of evaluated arguments, it receives the AST nodes of its arguments. These can then be used to dynamically assembly a resulting AST node that is returned to compiler. That node is then inserted instead of an ordinary call to the macro function.

A further extension of this mechanism would be a feedback interface for the declaration of macro (and possibly other) functions. It would be possible to declare an argument to be something other than an expression. The implementation control structures would be one obvious use.

Apart from the invocation of macro functions and later and insertion of the resulting AST node, there are other requirements to a useful macro system: The AST manipulation API. Working with the AST in Prexonite can be painful at times as it is geared towards statically compiled code. Not only is it a very complex but also a very irregular API. It was never meant to be exposed in such a direct fashion. Moreover, it is not trivial to refactor the API as it is used from in generated code, which means that the usual refactoring tools will only find a small subset of all uses in the system.

The ast and SI features of the psr\ast.pxs library are a start but far from providing a convenient way of creating new AST nodes. Both projects mentioned in the article about meta programming employ quotation and splicing.

Quotes

An expression that represents the AST of itself. (Probably the worst definition ever).

Example for expression quote

var node = (. x.member(5) + 6 .);
//  equivalent to something like
var node = ast("BinaryOperator", BinaryOperator.Addition, 
    ast("GetSetMember",ast("GetSetSymbol","x"),"member",[ast("Constant",5)]),
    ast("Constant",6)); //Simplified

Example for statement quote:

var node = 
{. 
  while(x < 5)
    println(x++);
.};
//   equivalent to something like:
var node = ast("WhileLoop",
  ast("BinaryOperator",BinaryOperator.LessThan,
    ast("GetSetSymbol","x"), ast("Constant", 5)),
     ast("Block", [
       ast("GetSetSymbol","println",[
         ast("UnaryOperator",UnaryOperator.PostIncrement,
           ast("GetSetSymbol","x"))
       ]);
]));

One easily sees that constructing AST nodes procedurally is no fun. The actual API is even more complicated. With a whole army of helper and wrapper functions, the most common constructs could be simplified but nothing beats expressing the code fragment you want with that same fragment. Quotes, however, are practically useless as they are static in nature. Enter

Splicing

Insertion of dynamically generated AST nodes into static ones.

Splicing would come in two forms with my approach. First of all, splicing happens when “calling” a macro function in normal code. The resulting AST node is spliced into the normal code in place of the macro call.

The second form is splicing into quotes. In a way similar to string inter- polation.

Example:

macro my_foreach(iterator, seq, block)
{
  var node = 
  {.
    foreach(.(iterator). in .(seq).)
     .{ block }.
  .};
  return node;
}

The above example happens to uncover the challenge of delaying the effective construction of the foreach loop. The block passed in via the macros arguments might employ structured gotos (`break` and `continue`) which are normally resolved or at least linked when the AST is constructed. But since not even the logical target is known until the block is spliced into the quote, the task of linking the control flow of the block and the loop together might be non-trivial.

Hygiene is another topic. The compiler must generate unique names for all symbols used literally inside a quote. Otherwise the macro might accidentally use unrelated variables inside the calling function. This, however, has sever implications on either usability or complexity of quotes as the following example demonstrates:

macro for_upto(sym, max, action)
{
  var i_var = (. var i; .); //Does not work. Variables are not 
  //  declared by AST nodes
  
  var i = create_variable("i");
  
  return 
  {.
    for(.(i). = 0; .(i). < .(max). ; .(i).++)
      .{ action }.
  .};
}

Whereas create_variable would be some sort of helper function that generates a unique (shadow) variable within the target returns a reference to the corresponding node. One might come up with a slightly friendlier notation for splicing expressions. Something like $$i maybe.

And then there is the issue of AST node not being designed to be used in multiple places (they are not immutable). Therefor the splicing implementation would have to insert copies of the nodes inserted. Not to mention that the AST doesn’t implement copying naturally.

Conclusion

Implementing a meta programming system on top of Prexonite script similar to that of Converge is not realistic. One has to design both the language and the compiler with such a goal in mind. Adapting the existing compiler would most definitely result in a mess.

I, however, think that the implementation of macro functions is feasible and useful. Macros won’t revolutionise how the language is used, but will complement the existing dynamic features to enable concise solutions, should the programmer decide to invest enough time into the macro system.

Shadow ids in Prexonite Script

It’s time for a new release of the Prexonite Script language:

New this time around is the very small but useful addition of shadow ids, a feature not entirely unlike the concept of shadowing in C#, thus the name.

Imagine you are working with two comparatively complex libraries, say a convenient wrapper around the Prexonite compiler and some sort of templating engine. If the respective authors of those two libraries both decide to use the identifier `load` for one of their functions, you end up in trouble.

Even though it is absolutely possible to define different symbols (i.e., names) to differentiate between those functions, Prexonite’s architecture does not allow two functions with the same physical name inside one application. And since the physical id happens to be the id the function has been defined with, a collision is inevitable.

Enter shadow ids

Have a look at the following example:

function as load, compiler_load (path, loader)
{
  if(loader is Null)
  {
    var engine = asm(ldr.eng);
    var app = asm(ldr.app);
    loader = new Prexonite::Compiler::Loader
      (engine, app);
  }
  loader.LoadFromFile(path);
}

This function is a wrapper around the Prexonite.Compiler.Loader.LoadFromFile method in some fictional compiler library file. Notice the keyword as in front of the list of names for that function. It means that the function itself is anonymous, i.e. it’s actual name is generated by the compiler. To access the function, you may use any of the alternative names specified.

In addition to that, a generalization for shadow ids is provided for easy function alias definition:

function compiler_load as load (path, loader)
{
  …
}

…which defines an ordinary function compiler_load and immediately adds an alias load for convenient access. This is really just syntactic sugar for

function compiler_load (path, loader)
{
  …
}
declare compiler_load as load;

Limitations

  • Shadow ids do not provide obfuscation, they are formed by appending a unique string to the first id in the alias list.So function as f, g(){} will be named something like f\gene05c25a4869140c599e4982665b48417. Also, f will be persisted in meta data as the functions LogicalId. This, however is an implementation detail and must not be relied upon.
  • There is currently no equivalent for global variable definitions, although such a feature is planned.

Future of Prexonite Script

Bad news first: Prexonite and Prexonite Script are both as mature as they will get. The architecture has its limits and expanding it further to improve the language/runtime is unreasonably complicated.

Prexonite Script will, however, remain my favourite sandbox for experimenting and improving my language design skills. At least until a successor is written. There are two particular areas the next releases will focus on:

Debug support

The Prexonite Script architecture was never built with debugging in mind. Although line/column/file information is (somewhat) propagated to the AST, it is not used thereafter. There is no mechanism to map instructions to code lines (or vice-versa). Nontheless I am currently prototyping a simple, byte code based, command line debugger.

It will probably come in the form of a library with a special breakpoint function, that invokes an environment not unlike the interactive mode for Prx.exe. It supports inspection of the stack, local variables, watch expressions, stepping (through byte code, not statements) as well as step into functionality.

Have a look at this example session:

Local variables of main
 var y = 2;
 var x = 1;

Stack of context of function main
 1: 
 2: 
code = function main
[
LogicalId main;import False;
\symbol_mapping {0 -> 0,
1 -> 1};
]
 does asm {
var y,x
   00: @func.0 debug_break
        ldc.int 1
        stloc x
        ldloc x
        ldc.int 1
        add
        stloc y
        ldloc y
        ldloc x
-> 09:  cgt
        jump.f 14
        ldloc x
       @cmd.1 println
        jump 16
   14:  ldloc y
       @cmd.1 println
   16:  ldc.string "Bye!"
        cmd.1 println
   18:  ret.value
   19: }

main@9>

Notice how references to local variables are shown by-name even though the underlying bytecode actually uses the faster by-index opcodes. At the top you see the list of local variables, followed by a display of the current stack. Above the prompt is an enhanced view of the bytecode with only important addresses marked (beginning, end, current and jumpt targets).

What is missing right now, is step-into. Because the debugger is invoked on a per stack context basis (aka function activation). Enabling the debugger for calls made from the current context requires intercepting all corresponding calls. The task is non-trivial because closure, coroutine and structure method calls are statically indistinguishable from calls to aribrary implementations of IIndirectCall and friends.

Macro system

Another area of development is the addition of a more user friendly macro system into Prexonite Script. Currently, one can rewrite undefined ids, a mechanism that is not supposed to be a replacement for macros but rather a means of changing the default handling of unknown ids.

Alternatively, compiler hooks can be used to transform entire functions. Not only is it very tedious to find the specific calls to replace, it’s also not guaranteed that you find every instance of a certain expression by traversing the tree. All uses of compiler hooks share a similar pattern of searching a certain dummy function reference and replacing it with custom nodes, using the transformed function itself and any of its parameters as arguments.

A possible simplification of this scheme would be the explicit support for macro functions:

macro square(x)
{
  //Define a
  var tmp = define("tmp");
  var assignment = ast("GetSetSymbol",SetCall,tmp,[x]);
  var computation = multiply(tmp,tmp);
  return block([assignment, computation]);
}

The macro square transforms calls to it like ( square(x+2) ) to an expression like ( let tmp = x+2 in (tmp)*(tmp) ). The details are not fleshed out yet. A more concrete discussion follows.