Prexonite 1.2.1 – Partial Application and Mercurial

Release Policy

I haven’t released a "named" version of Prexonite in a long time. It just wasn’t worth the effort. I’m giving explicit version numbers another shot, starting with this release.

Versioning Scheme

I’m looking at a very simple versioning scheme:

lang.major.minor[.patch]

Four decimal numbers, separated by dots. The first number (lang) stands for the version of the language. This number will probably stay the same as long as I don’t rewrite the entire implementation from scratch, including breaking changes to a signification fraction of existing programs written in Prexonite Script or targetting the Prexonite VM. This would be something like the jump from PHP4 to PHP5.

The second number (major) will change infrequently and only as a consequence of profound changes – in the language or in the underlying implementation. The removal of built-in operators from the VM was one such change. But more on that later.

The third number (minor) is where most of the action is. Every increment in the minor version will correspond to the completion of a milestone, which I’ll talk about shortly. Patch versions, finally, are intended for backports from the current development version.

Milestones

I would like to get more consistent releases of Prexonite, not just a bunch of development snapshots that you have to correlate to blog posts and commit logs in order to find out what exactly is new.

In order to achieve this, I will define milestones that are achievable in no more than three months. At the end each milestone, I will have a version of Prexonite that is ready to be used "in the wild". Additionally, each milestone will be associated with a "theme". I will try to focus my work on Prexonite to one area at a time. The 1.2.1 milestone for instance was focused on partial application.

Mercurial

For 1.2.1 I switched my version control system from Subversion over to Mercurial, still hosted as an open source project at assembla.com. Subversion is great but for me there are some benefits in using Mercurial.

Easier than Subversion

I personally find Mercurial easier to use than Subversion, even though there are more concepts involved (local repository versus working copy/pull versus update). The main advantage of Mercurial is that you can clone the main repository, experiment around, create branches, create tags, attempt complicated merges and when it doesn’t work out, you can just throw away your local repo and make a new clone (or evacuate the successful commits by cloning locally up the last "good" revision). In Subversion, all the action happens on the central server. If you make a mistake it’s logged for all eternity.

Local commits

The second big advantage of Mercurial is being able to commit your changes locally. Sure, every commit should be a working system, but that’s more of an ideal than a realistic scenario. I want to be able to save my work in a state that is at least semi-working before I attempt a risky rewrite. I also don’t want to work without revision control wen I’m on the road. Sure there is 3G, but with Mercurial and local commits my laptop battery lasts longer.

Partial Application

One to the big new feature in this and some past releases. Partial application is a new feature of Prexonite Script that allows you to define function values for very simple expressions in a succinct and, hopefully, intuitive way.

f(1, ?) // as a lambda: x => f(1, x)

You can turn almost any simple expression into a partial application by denoting the open arguments as question marks (?). The value of such a partially applied expression will be a function-like value. When the partial application is applied, the questions marks will be substituted with the provided arguments and the expression will be evaluated. Any operand of the expression, that is not a question mark will only be evaluated once at the point where the partial application is created.

Many expressions support partial application:

new List(?), 
x.Append(?), 
?.ToString(), 
System::Console.WriteLine(?),
? is List, 
?~Real

Including all unary and binary operators

1 + ?, 
?*2, 
x & ?, 
y and ?

You can also define multiple open parameters

f(?, 1, ?), 
? == ?

Or map the supplied arguments in arbitrary orders or multiple times

f(?1, 2, ?0, ?0)

Limited to simple Expressions

A key concept of partial application, is that the placeholders can only as direct operands to the operation/function to be partially applied.

f(1+?,g(16*3)+2)

Will not produce a partial application of f, but pass the partial application (1 + ?) to f as an argument. You either have to resort to a full blown lambda expression for this

x => f(1+x,g(16*3)+2)

Or use function composition (also newly introduced)

? + 1 then f(?, g(16*3)+2)

The `then` keyword combines to functions into one:

g then f //means about the same as `x => f(g(x))`

Note how all non-placeholder operands of partial applications can be arbitrarily complex.

Differences between lambda expressions and partial applications

Lambda expressions and partial applications are in most cases not equivalent. The key difference is, that all non-placeholder arguments, the so-called "closed arguments", are evaluated when the partial application object is created, not when the resulting object is called — as is the case with lambda expressions. In the example above, it might therefore be desirable to resort to the more difficult to understand function composition syntax (using `then`) in order to avoid re-computing `g(16*3)+2` over and over again.

Partial applications are sadly not as fast as lambda expressions. Invoking a partial application involves the copying the closed arguments and the freshly supplied actual arguments into an effective arguments array, a step that lambdas don’t have to perform. The effect isn’t that dramatic, though. Partial application is measurably slower, yes, but within the same range.

The creation of partial applications, on the other hand, is a different story. Whereas for a lambda expression, the runtime just has to capture references to all shared variables, in the case of partial applications, a mapping consisting of entries for both placeholders and the closed arguments has to be computed. The creation of a partial application is an order of magnitude slower than the creation of a closure (the lambda expression function object). So if you care deeply about performance, don’t create partial applications in tight inner loops.

Other Changes

Macro references

The illusion that macros can be used just like any function is pretty good, but not perfect. It breaks down miserably, when you try to take the reference of a macro or attempt to create a partial application thereof. For this release, the compiler will detect and reject any such attempts. Macro writer can make use of the `macro\reference` macro to work around this limitation.

For most macros, taking a reference or creating a partial application, doesn’t really make sense anyway. But there are some exceptions. I hope that the next release will address these scenarios, and allow macro writers to make their macros compatible with partial application and reference-taking.

Single Quotes

This recent release has also seen the addition of the single quote (') as valid character for identifiers (except at the beginning) and as a separator for number literals.

function bob(cat)
{
  var bob's_cat = cat + "of bob";
  return bob's_cat + " weighs " + 100'000 + "kg";
}

Single quotes in number literals are accepted, but otherwise ignored. They’re thrown away right after lexical analysis, so not even macros can find out if there ever were single quotes.

In identifiers, single quotes have no special meaning, they’re just characters and you can use them in any way you like. As long as the identifier doesn’t start with a single quote, you’re fine.

Implementation of operators

Except for the lazy logical ones, all operators in Prexonite are now implemented as commands and no longer part of the various virtual machines. This doesn’t just simplify the latter, it also makes partial application of those operators possible. As a user of Prexonite Script, you won’t notice the change at all, even if you have assembler code, that uses the operator instructions (these are transparently translated into appropriate command calls).

If you want, you can redefine the built-in operators. Whenever you write `1 + 2` it is compiled as `(+)(1,2)`. If your function `my_plus` is declared as `(+)`, it will be called instead. You probably don’t want to do that, but it’s possible. (And yes, (+) is a proper identifier. You can even use it in meta data sections)

CIL Extensions

A requirement to implementing partial application efficiently in CIL, but without adding new VM instructions, was the addition of CIL Extensions, a new API that allows commands to gain access to compile-time-constant arguments during CIL code generation. This essentially enables variable arguments instructions. So far, partial application is the only feature that makes use of this.

v.Next

The next planned release is going to be 1.2.2, focusing on generalising the macro system. The most important feature is making the macro system also available to commands, and thus to managed code.

As this will involve a partial rewrite of a substantial portion of the macro system, it could take a while. Hopefully, it’ll also support the partial application of macros and a cleaner, more isolated interface for macros.

Shadow ids in Prexonite Script

It’s time for a new release of the Prexonite Script language:

New this time around is the very small but useful addition of shadow ids, a feature not entirely unlike the concept of shadowing in C#, thus the name.

Imagine you are working with two comparatively complex libraries, say a convenient wrapper around the Prexonite compiler and some sort of templating engine. If the respective authors of those two libraries both decide to use the identifier `load` for one of their functions, you end up in trouble.

Even though it is absolutely possible to define different symbols (i.e., names) to differentiate between those functions, Prexonite’s architecture does not allow two functions with the same physical name inside one application. And since the physical id happens to be the id the function has been defined with, a collision is inevitable.

Enter shadow ids

Have a look at the following example:

function as load, compiler_load (path, loader)
{
  if(loader is Null)
  {
    var engine = asm(ldr.eng);
    var app = asm(ldr.app);
    loader = new Prexonite::Compiler::Loader
      (engine, app);
  }
  loader.LoadFromFile(path);
}

This function is a wrapper around the Prexonite.Compiler.Loader.LoadFromFile method in some fictional compiler library file. Notice the keyword as in front of the list of names for that function. It means that the function itself is anonymous, i.e. it’s actual name is generated by the compiler. To access the function, you may use any of the alternative names specified.

In addition to that, a generalization for shadow ids is provided for easy function alias definition:

function compiler_load as load (path, loader)
{
  …
}

…which defines an ordinary function compiler_load and immediately adds an alias load for convenient access. This is really just syntactic sugar for

function compiler_load (path, loader)
{
  …
}
declare compiler_load as load;

Limitations

  • Shadow ids do not provide obfuscation, they are formed by appending a unique string to the first id in the alias list.So function as f, g(){} will be named something like f\gene05c25a4869140c599e4982665b48417. Also, f will be persisted in meta data as the functions LogicalId. This, however is an implementation detail and must not be relied upon.
  • There is currently no equivalent for global variable definitions, although such a feature is planned.

Future of Prexonite Script

Bad news first: Prexonite and Prexonite Script are both as mature as they will get. The architecture has its limits and expanding it further to improve the language/runtime is unreasonably complicated.

Prexonite Script will, however, remain my favourite sandbox for experimenting and improving my language design skills. At least until a successor is written. There are two particular areas the next releases will focus on:

Debug support

The Prexonite Script architecture was never built with debugging in mind. Although line/column/file information is (somewhat) propagated to the AST, it is not used thereafter. There is no mechanism to map instructions to code lines (or vice-versa). Nontheless I am currently prototyping a simple, byte code based, command line debugger.

It will probably come in the form of a library with a special breakpoint function, that invokes an environment not unlike the interactive mode for Prx.exe. It supports inspection of the stack, local variables, watch expressions, stepping (through byte code, not statements) as well as step into functionality.

Have a look at this example session:

Local variables of main
 var y = 2;
 var x = 1;

Stack of context of function main
 1: 
 2: 
code = function main
[
LogicalId main;import False;
\symbol_mapping {0 -> 0,
1 -> 1};
]
 does asm {
var y,x
   00: @func.0 debug_break
        ldc.int 1
        stloc x
        ldloc x
        ldc.int 1
        add
        stloc y
        ldloc y
        ldloc x
-> 09:  cgt
        jump.f 14
        ldloc x
       @cmd.1 println
        jump 16
   14:  ldloc y
       @cmd.1 println
   16:  ldc.string "Bye!"
        cmd.1 println
   18:  ret.value
   19: }

main@9>

Notice how references to local variables are shown by-name even though the underlying bytecode actually uses the faster by-index opcodes. At the top you see the list of local variables, followed by a display of the current stack. Above the prompt is an enhanced view of the bytecode with only important addresses marked (beginning, end, current and jumpt targets).

What is missing right now, is step-into. Because the debugger is invoked on a per stack context basis (aka function activation). Enabling the debugger for calls made from the current context requires intercepting all corresponding calls. The task is non-trivial because closure, coroutine and structure method calls are statically indistinguishable from calls to aribrary implementations of IIndirectCall and friends.

Macro system

Another area of development is the addition of a more user friendly macro system into Prexonite Script. Currently, one can rewrite undefined ids, a mechanism that is not supposed to be a replacement for macros but rather a means of changing the default handling of unknown ids.

Alternatively, compiler hooks can be used to transform entire functions. Not only is it very tedious to find the specific calls to replace, it’s also not guaranteed that you find every instance of a certain expression by traversing the tree. All uses of compiler hooks share a similar pattern of searching a certain dummy function reference and replacing it with custom nodes, using the transformed function itself and any of its parameters as arguments.

A possible simplification of this scheme would be the explicit support for macro functions:

macro square(x)
{
  //Define a
  var tmp = define("tmp");
  var assignment = ast("GetSetSymbol",SetCall,tmp,[x]);
  var computation = multiply(tmp,tmp);
  return block([assignment, computation]);
}

The macro square transforms calls to it like ( square(x+2) ) to an expression like ( let tmp = x+2 in (tmp)*(tmp) ). The details are not fleshed out yet. A more concrete discussion follows.

ACFactory Day 3: Authoring XAML

This is the third part of a series documenting the development of ‘ACFactory’ (ACFabrik in German), an application that generates printable character sheets for the Pen & Paper role playing game Arcane Codex (English page).

You might also want to read Day 2: ExtendableEnum and XAML Serialization.

File Formats

ACFactory is going to use two file formats: *.aclib and *.achero. The former basically contains data for rules implemented in code. Library files are ordinary XAML (and therefore also XML) files with a Library root node.

Hero files (*.achero) will most likely be more important to ordinary users. They contain, well, heroes to be loaded into the application. Milestone 1 doesn’t require editing of heroes, neither does it require that the user can change the hero once the application runs. I do however plan to implement that functionality, should I find the time to revise the application. In the distant future, ACFactory might even handle multiple heroes at once via a tabbed interface.

User Interface

As I am not an artist, ACFactory will most probably not look very pretty. I do, however, plan to take advantage of at least some of WPF’s rich styling capabilities. I had a pretty cool Office 2007-like page preview control assembled, only to find out, that bitmap effects are damn slow, like in it-takes-a-second-to-highlight-a-hyperlink-slow. Why on earth they have to be rendered in software, on the UI thread, is beyond me (and I hope it doesn’t have anything to do with WPF being rushed to market). Tim Cahill blames the limitations of graphics cards in one of his blog posts.

Adding seemingly simple tweaks (e.g., clipping, bitmap effects) to our scene causes us to fall back to software, and software rending in WPF is slower than GDI+ software rendering.

First, the WPF software rendering code is derived from the GDI+ codebase. There are certain limits to what can be accomplished in hardware, and we have to work around what the hardware vendors give us. As graphics hardware evolves, those limits are likely to become better over time. If at least some portion of your scene is rendered in hardware, the cost of rendering is already going to be faster than it was in GDI+. Finally, we shipped a tool at the PDC called ‘Perforator’ to help identify where software rendering occurs.

I don’t know about you, but I have seen outer glow effects rendered at over 30 frames per second in many computer games, Fable only being one example. Maybe they just want to keep us developers with our uber-machines from terrorising poor users with too-flashy applications.

ACFactory_Day3