Prexonite May 2008 Update

Downloads

Update

There have not been any significant changes since the introduction of the CIL compiler into the Prexonite, yet the current version comes with a number of performance optimizations regarding the generated CIL byte code.The majority of the built-in commands and types now use the ICilCompilerAware interface, which is used by the CIL compiler to let commands and types emit highly customized code. Calling println with no arguments for instance, results in a static call to void System::Console.WriteLine() directly in the compiled method.Similarly, type expressions in CIL functions are no longer implemented via type expression parsing but by directly referencing the corresponding singleton PType objects.But the most important improvement is the possibility to statically link Prexonite function calls in CIL compiled methods, which makes yet another hashtable lookup redundant at the cost of additional memory: A dynamically generated class has static fields for each and every function used by the compiled application. This can be a problem if you plan to re-compile your CIL-implementations, as dynamic type, unlike dynamic functions, cannot be garbage collected by design. It is, however, possible to disable the generation of such a class by passing false to CompileToCil.And on a side node: The often used library function struct has been implemented as a compiler hook for improved performance. By resolving the members at compile time one does not only save run time, but also removes the need for dynamic lookups, which in turn enables the use of CIL compilation for struct-functions. This is especially helpful for immutable structs.

My Backpack

I just came across a series of posts on lifehacker.com in which the contents of different peoples bags and backpacks is presented. A really cool idea, if you ask me. So let’s do the same here:

My Backpack

  1. Good old Texas Instruments TI-30XIIS.
  2. USB-to-Mini-USB cable. Used for my cell and my MP3 player
  3. 3.5mm stereo TRS connector. In case a bunch of people want to listen to my cool MP3s
  4. iriver driver software
  5. Cheap Panasonic headphones
  6. Sunglasses. The bag can also be used to clean screens
  7. The iriver Clix2, a really cool MP3 player
  8. Dice: 3xW20, 2xW10, 2xW6. Yes, I do play pen&paper role playing games
  9. Motorola RAZR V3i, the one everyone has
  10. 2 GB memory stick
  11. Wallet
  12. Agenda
  13. Pencil sharpener (vital for role playing)
  14. Set square
  15. Pills against hay fever
  16. “Formulae and Tables”, a very handy collection of commonly used formulae
  17. A5 notepad. I use an A4 pad for note taking in school but everything else gets written down on its smaller brother.
  18. My favourite black pen
  19. My favourite blue pen
  20. A pencil with an eraser
  21. A better eraser
  22. TipEx. In case something goes wrong in a test (barely used :-P )
  23. Suncream

CIL compilation hints and their effects

*Update 2008/02/20* I have just merged the CIL compiler branch into the trunk. The CompileToCil command is now officially part of Prexonite:Prexonite source code (trunk)In the last article, I presented the Prexonite CIL compiler and the huge performance improvements it comes with. Unfortunately, the compiled code has to be dynamically typed as the CIL compiler does not perform any data flow analysis and can therefore not possibly infer the correct types. It does not even create its own representation of the byte code program.However, to say Prexonite Script (the language) is strictly dynamically typed is actually wrong, as the Prexonite compiler emits code for which the types and even the method overloads are known at compile time. It’s just that the virtual machine does not provide a way to take advantage of this knowledge.One such example is the foreach loop, a construct that consists of

  • An expression (the list)
  • A block of statements
  • A left-hand value (the element)

and gets transformed into

var enum = $list$.GetEnumerator~IEnumerator;while(enum.MoveNext()){$element$ = enum.Current;$block$;}

This pseudo code represents what is emitted by the Prexonite compiler for foreach AST nodes. It is clear that enum has to be at least of type IEnumerator in all cases. This information could enable the CIL compiler to statically type the variable enum, turning two late-bound calls (MoveNext and Current) into virtual calls.

CIL compilation hints

CIL compilation hints are basically a reverse mapping from byte code to AST nodes, reduced to the minimal amount of information required by the CIL compiler to emit optimized code. It is not that the whole AST is now encoded in the Meta tables of functions. Only nodes, for which the CIL compiler could generate better code, emit CIL hints.One example is the foreach node, which emits the name of the enumerator variable and the addresses of the late-bound calls to be optimized. The CIL compiler decodes this information and performs the necessary steps. The enumerator variable for instance will be of type IEnumerator<PValue> and won’t be initialized ahead of time.

Impact on performance

The two main paradigms to interact with sequences in Prexonite are the combination of coroutines (sequence operators like where and map) and the use of foreach loops. While coroutines have the advantage of compose ability and deferred execution, foreach loops are usually faster.Again, I used micro benchmarks to demonstrate the impact on performance. For practical reasons the number of iterations depends on the size of the set to iterate over in the inner loop. N = 200’000 makes the basis. With sets of 10 and 100 elements, N is reduced to 20’000 and 2’000 respectively.Iterations over a set (Measurements)What you are seeing here are performance improvements of 950 to 3’400%, but keep in mind that those are very specialized micro benchmarks and that unless your program exclusively consists of mindless foreach loops, you will not likely experience such speed-ups.Nonetheless, iteration over lists is a very important aspect of many of the programs I have written in Prexonite Script.

Prexonite CIL Functions

Save the "What the f…" for later and just look at the two snippets below.

ldloc.1ldc.i4.5addstloc.1

Listing 1: a = a + 5 in CIL assembler

ldloci  1ldc.int 5addstloci  1

Listing 2: a = a + 5 in Prexonite assembler

On the left you see four CIL assembler op codes, while the other snippet represents the exact same program, just written in Prexonite byte code assembler. The fact that the two programs look so similar is no coincidence as the Prexonite virtual machine was actually modelled after the CIL’s execution model. This exact similarity can be exploited to make Prexonite a lot faster.

A Prexonite to CIL compiler

Now before you get too excited, Prexonite Script still is what they call a “Dynamic Language” and a lot of its features are implemented in the underlying Prexonite virtual machine instead of the language compiler. Also, Prexonite byte code is not statically typed, which makes a straight translation to CIL impossible without very sophisticated data flow analysis and complete type inference. As I am not familiar with either of these topics, I decided to keep the Prexonite functions untyped. This is where the PValue class comes into play. It encapsulates a dynamically typed piece of data and provides many methods to interact with the contained data via late binding.

In all cases, an implementation of a Prexonite function in CIL must show the exact same behaviour as the original, interpreted implementation. Functions that interact with Prexonite stack frames cannot be compiled to CIL as they are no longer executed on the virtual machine’s stack but the CLR’s instead. Therefore, CIL implementations must be able to exist alongside interpreted implementations and that as transparently as possible. Also, since the Prexonite virtual machine allows for code generation and manipulation at runtime, CIL implementations must be replaceable. This unfortunately also means that function calls inside CIL implementations cannot be statically linked as the target function might change the implementation strategy (interpreted, CIL) every moment.

How it’s done

Since the Prexonite to CIL compiler operates on Prexonite byte code, it would not make much sense to use the C# or VB CodeDOM and the corresponding compiler. Instead System.Reflection.Emit provides the necessary API. Since implementations must be replaceable, dynamic types are not an option and the so called lightweight functions are used.

The compiler is designed to operate at runtime, invoked by the running program itself. This is, because it analyses the whole application to identify functions that are not compatible with compilation to CIL. Such functions are marked with the Meta entry volatile.

The compilation process itself is actually quite straight forward. First the function is analysed in order to determine the number of temporary variables required, to build up a symbol table and to identify shared (via closures) and non-shared variables. Then the common function header is emitted including the creation of PVariable objects for shared variables and the initialisation of non-shared variables with PType.Null.
Then, the variables representing arguments are initialised with either PType.Null or the value supplied in the arguments array and finally the special variable args is set to a list of those same arguments if required by the function.

What follows is a huge loop that iterates over every instruction in the functions code and passes it into a giant switch statement, which translates every Prexonite byte code instruction into a series of CIL op codes.

Therefore, the CIL implementation of the program in Listing 2 will look like in the pseudo CIL in Listing 3.

As you can see, an untyped implementation of this simple program expands into quite some code. Notice that due to the absence of a rotation op code, the implementation requires temporary variables to insert the local stack context in the call to Addition.

ldloc var1ldc.i4.5box int32call IntPType PType::get_Int()newobj instance void PValue::.ctor(object, PType)stloc temp1ldloc sctxldloc temp1call instance class PValue PValue::Addition(StackContext, PValue)stloc var1

Listing 3: Actual CIL implementation of the program in Listing 2
Note: I have shortened the fully qualified type names for better readability.

Is it worth the effort?

As with all optimization techniques, we must ask ourselves whether the effort for implementing it is worth the gain in performance (be it memory or speed). At this point, let me just throw the results of an amateurish micro benchmark at you.

CIl_micro_benchmark

One can clearly see that CIL implementations are superior. They perform the same tasks in 60% (empty_loop) to 30% (rec_echo x 100) of the time required by the interpreted versions. Since the CIL compiler performs many of the Meta data lookups required for the creation of a stack frame at compile time, function calls to CIL implementations are much faster. Keep in mind though that only interpreted functions can take advantage of tail calls. To prevent an overflow of the managed stack, you should implement infinite recursive loops in interpreted functions.

Overall, you could say that compilation to CIL will result in a free performance improvement of over 65 percent in most cases.

function rec_echo(n) =if(n == 0)else1 + rec_echo(n-1);
function rec_echo_direct(n,r) =if(n == 0)relserec_echo(n-1,(r??0)+1);

A functional touch

The last days, I've been working on two things: The reorganization of built-in commands and the improvement of the "Functional Experience".Why do commands need reordering? Because it gets difficult to find the right file among over 40 commands.Why the sudden increase in numbers? I added proxies for System.Math methods for both easy and fast access to mathematical functions such as Sqrt and Sin, but also Pi.Additionally, the most important coroutines from the Prexonite Standard Repository for list processing have been implemented in managed code, again for performance reasons. Map, Where, Limit, Skip and friends now inject managed coroutines into the stack.The commands are now organized in the namespaces Core, List, Math and Text. The latter currently contains the fixed layout functions SetCenter, SetLeft and SetRight, which fill a given string with some character sequence until it has a certain length and is aligned correctly.Now what the hell do you mean by "Functional Experience"?I haven't told anyone but the Prexonite VM is absolutely terrible when it comes to recursion. Unfortunately, recursion happens to be one of the key elements in functional programming and, as you might have noticed, Prexonite Script comes with a lot of syntactic sugar that makes it look like a functional programming language.Ok, lambda expressions and closures are "true" functional features but the lack of a sophisticated type system makes it almost impossible to reason about a program in the way functional compilers do. Nonetheless, I added two features with the last commit, that make PXS a tiny little bit more functional.

First of all: Tail Call Optimization
Yes, the thing that helps with recursion.
function fac(n,r) =
    if(n == 1)
        r
    else
        fac(n-1,n*r);

I benchmarked this function three times, with different tail call optimization strategies. The difference is huge. See for yourself (10'000 computations of 16!):Comparison of different tail call optimization strategies.Two strategies are employed: An implementation of tail call optimization for directly recursive functions inside the compiler, that turns recursive calls into direct iterations (jumps to the beginning of the function with different arguments). What I call "virtual machine optimization" is a special tail call instruction that removes the current stack frame after having called the function or closure.Now apparently the virtual machine "optimization" is not particularly fast but uses far less memory than the normal invocation.Prexonite will never be able to recognize indirect recursion due to the lack of control flow analysis. This, however, does not mean that return statements inside conditions or calls in tail position are not recognized. I'm not sure if Prexonite will ever handle simple recursive return expressions like the normal definition of the factorial:
function fac n =
    if (n==1)
        1
    else
        n*fac(n-1);

Also in the repository is an experimental and partial implementation of the famous call-with-current-continuation from Scheme. In PXS it is known as callcc.I must admit that I don't really know much about call/cc and how it works, especially regarding the stack. Creating a callable object from the current state of a function invocation is no problem. I just don't understand some of the scheme samples, I've been looking at (terribly difficult to read...)The following snippet stores a continuation of the function two in the global variable plusone. Invoking this continuation with, say, 6 returns 7 as the name suggests.
var plusone;

function two =
    1 + call\cc(->one);
       
function one(continuation)
{
    plusone = continuation;
    return 1;
}

The Philosophy Behind: The Prexonite Type System

This is the second article in the "Philosophy Behind"-series, picking up a specialty of one of my projects and explaining how it came to be made. Last time I wrote about the "auto dereferencing" concept in Prexonite Script.In today's article I will explain the reasons behind the design of the Prexonite type system.Prexonite faces the same problem as other implementations of late-bound languages on the .NET platform: How to map the CTS to the languages type system.Prexonite_TypeSystemI think the basic types Int32, Double, Boolean and String are more suited for a statically typed environment, so my type system must allow me to provide wrappers around third-party classes/structs.Wrapping and unwrapping objects must be as transparent as possible. Return values from base class library methods have to be wrapped in their Prexonite equivalent.At the same time, it is not practical to write a custom wrapper for every possible C# or VB.NET library, so there must be some sort of universal wrapper for CLR objects. With users of Prexonite being able to write their own wrappers, it must be possible to have multiple wrappers for the same CTS type. Also, some wrappers might handle more than one type.The solution for Prexonite is the abstract class PType and some concrete subclasses, including the universal ObjectPType, which does all the late binding. Since Prexonite Script performs type checks at runtime, type information has to be associated with every data object, which is just what the class PValue does.What might surprise you, is the fact that Null is considered a type. Every null reference automatically has type Null. Unlike the sturdy null references in C#, instances of Prexonite Null are completely functional objects. They react to operators, can be converted to basic values (Int, String,...) and even provide a ToString method. However, Null does have a special position in the Prexonite type system: it is not possible to write and use your own null reference wrapper.

Prexonite standard repository finally released.

I just checked the collection helper functions I call Prexonite standard repository (psr) into SVN. As I am too lazy to create a full release, I will just supply you with a trac generated zip file.

Following is a short documentation (or rather an overview) of psr:

The Prexonite Standard Library is a collection of scripts that help in day-to-day hacking with Prexonite Script. This page shortly outlines the contents of each of the currently available files.

debug.pxs

Dependencies
none

The script enables special treating of the debug command using compiler hooks for increased performance. For each call to the debug command, it checks whether the function requests debugging (through the debugging MetaKey). Unless that is the case, the call will be removed. if-Blocks using debug as their condition will be evaluated at compile time in respect to the debugging key.

It is possible to use the debug command without including this script, in that case, however, your scripts will also contain calls to debug when not being debugged.

The actual functionality of this script has been moved to managed code inside the Prexonite.dll for performance reasons in #18. CompilerHooks have to be used with care. While the loss in compiler performance is barely noticeable with just one user defined CompilerHook, many of them can really slow the translation down. The managed implementation uses a shared CompilerHook to further save time, should Prexonite.dll ever include additional CompilerHooks

(more...)

Is Digital Distribution Flawed?

 

Coding Horror: The Sad State of Digital Software Distribution just opened my eyes.

Jeff Atwood writes about prices of digitally distributed software. Have you noticed, that digital versions normally cost as much as the physical ones? Also keep in mind, that the former actually comes with additional value in the form of a DVD box and a manual. Additionally, the digital version is often crippled with DRM, so shouldn't those actually cost less?

Even worse: Jeff found occasions where the digital version is actually more expensive:

Instead, I find that download options for commercial software are quite rare. Even when the download option is available, you end up paying the same price as retail or even more. Here's a typical example. I purchased Titan Quest: Gold from Steam about a month ago. I paid $29.95, which is the standard retail box price. But online discounters sell boxed copies of the very same game for $22.90.

Digital Distribution: $29.95
Retail Copy: $22.90

Titan Quest Gold: downloadable version, $29.95
Titan Quest Gold: retail version, $22.90

Selling directly to the consumer via download means bypassing the entire brick and mortar sales chain. This should result in cheaper prices than retail, not the same prices-- and it should never result in higher prices. Paying a premium for the privilege of downloading software is complete ripoff, and yet it happens all the time.

Selling digital copies is very profitable as servers are normally cheaper than retail stores. Now imagine what distribution costs, if the publisher actually owns the online shop...

But is all that criticism really justified? Is there a compensation for the missing molecules in digital distribution? Normally, there is: You can (re-)download your games/software whenever you want, wherever you want. I don't have to worry about scratching or loosing my Dongle (read DVD/CD).

However, this privilege is not necessarily restricted to digital versions. The retail version of Half-Life 2 is nothing more but a license key paired with a copy of the game. Something I really like, as I must confess that I have absolutely no clue where my Half-Life 2 DVD is. At the same time, I don't really care because I can always get another copy via steam.

There is an additional advantage: I prefer to play games in their "native" language, should I know it, that is. In most cases, this is English, but not all developers / publishers ship multi-language editions of their games. Oblivion being one example.
With Half-Life 2, this was never a concern, as I just downloaded the english version instead of installing the german one from the DVD.

I suspect, though, that there might be a drawback to these privileges. Half-Life 2 resisted falling into the budget sector for over two years, which I think is remarkable. Valve dictated prices via steam and through constantly releasing new bundles, making it actually quite difficult to get Half-Life 2 alone.

Nonetheless, I strongly support Steam as I does not bind me to those fragile DVDs, though I do not normally buy my games via steam due to the lack of a credit card. It doesn't have to be just Steam. Digital distribution is the future, especially when it comes to fighting piracy. I, however, will refuse to install an utility for each of the major vendors. Now its just Steam and EA Link, but I am sure, more will follow.

Creating a programming language

On October 31, I handed in a paper I have been working on for the past few months. It shortly outlines the process of creating a programming language with a focus on compiler construction:

Compilers

Since computers only process instructions that are part of their instruction set, programs written in programming languages have to be translated into functionally equivalent programs in machine language prior to their execution. This process is called compilation and is performed by compilers.To cope with this task, the translation is commonly split up into multiple steps, also referred to as phases. A compiler starts with reading the input file byte by byte, character by character.Like our eye splits up a text into individual words, the second step is to group meaningful characters together and remove those without meaning (e.g.,~whites paces). This step is called lexical analysis and results in a stream of tokens: short, categorized strings of characters.In the next phase, the compiler determines the relationship between the tokens. It applies the syntax of the programming language and is therefor called syntactical analysis. It results in a tree structure that represents the program.At this stage, some compilers apply additional transformations to the program to increase performance before finally generating code in the language of the target machine.

So, if you want to know how your favorite compiler works, have a look at "Creating a programming language" (PDF, 400KiB).

LaTeX, the anti-tool

I used it a lot in the past few days. Or rather: I got prevented from doing so by LaTeX itself.

I wrote a paper about creating a programming language and my supervisor suggested using LaTeX to typeset the document. This sounded like a good idea to me, as I had played around with LaTeX before. I wouldn't have to worry about coming up with a layout, keeping track of citations and the numbering and positioning of figures. Well...

Before I go on, I would like to state, that I love the basic principle of LaTeX: Text, annotated with semantic information, that gets automatically turned into a good looking document. Just like HTML...

The only problem: LaTeX is a chaotic chunk of hacks, glued together by a more or less robust package system. There is absolutely no consistency among the different extensions. A simple example are the packages pstricks and pdftricks: One abbreviates picture, the other does not. Why can't they just agree on a naming scheme?

Definitely the worst thing about LaTeX is the non-existing usefulness of its error messages. To a complete novice, it must look as if one would have to have invented LaTeX in order to understand it.

Also very annoying is the fact, that compilation is slow as hell, even on modern machines. I mean, taking 5 or so seconds would be understandable, if I'm compiling a 41'000-lines-of-code project, but not in the case of just 2000 lines.
Well, this statement is actually wrong as I must consider that the whole compiler is actually interpreted. LaTeX's flexibility comes at a price.

Looking backwards, I must say, that I spent at least as much time fighting LaTeX as I did writing my paper. Something must be seriously wrong. Isn't a tool supposed to make things easier? Did I miss anything?

The sad thing about this story: There is not really an alternative for texts that heavily rely on mathematics. At least none I know of.
So this probably wasn't my last paper typeset by LaTeX