Tamarin Interpreter Architecture Overview

I finally have a general interpreter architecture view of Tamarin in my head. I'll get into the tracing aspect of it later this week. Tamarin is pretty convoluted, so you may want to make yourself some coffee prior to jumping into the rabbit hole.

So far, my count on the number of languages used, in the order that I'll explain how they are used is five: Java, ActionScript, Forth, C, and Python. These languages can be broken down into two subgroups: Support code which includes Java/ActionScript, and the actual interpreter/tracing JIT: written in Python, Forth, and C.

The first half, which I call support code, helps Tamarin get up and running. Java is the easiest to understand. Java is used by Adobe Flex to compile JavaScript/ActionScript source files into .abc files.

During compile time, Flex is used to compile the native ActionScript files into .abc files. These native ActionScript files contain some of the support code of the ActionScript spec, such as Math.min()/Math.Max(); They also define all the native function calls for some Objects. For example, Array::Splice is defined in Array.as. Some of these functions also have hooks into C such as Math.Cos which is defined with:

    public static native function acos(x:Number):Number;

Such calls are expanded into C function calls during compilation time. They are all glued together via builtin.as, which compile all the ActionScript into .abc files.

Finally, the really convoluted part is how Python, Forth, and C interact with each other. Here is where the meat and potatoes are. Tamarin does double interpretation, meaning any .abc bytecode being executed is actually being interpreted twice: Once in C and once in Forth. The ABC bytecode implementations are in Forth. Therefore, the whole JavaScript implementation is in Forth. However, the Forth interpreter is written in C. So now you have two different languages being interpreted: ABC via Forth, and Forth via C. These Forth words are implemented as C functions. So in the end, you have a C interpreter which interprets Forth, which interprets the ABC. Therefore, in reality, any ABC bytecode is being interpreted twice in C.

Here is a quick flowchart showing the execution of a single ABC bytecode:

* Side note: Forth words are like functions in other languages

Python acts as the glue between Forth and C. During compilation, Python reads the JavaScript Interpreter Forth source file, and outputs two C header files: One which contains the Forth language interpreter, and one which represents the JavaScript interpreter implemented in Forth. A quick diagram is as follows:

The labels of inner/outer interpreter are taken from the Tamarin source. Hopefully this isn't too confusing. I'll be getting into implementation specific details over the week. Some of the hooks are pretty nifty.