Tamarin's New Interpreter - Converting from ABC to IL

Ever since the huge patch which removed double interpretation from Tamarin, most of the previous blog posts about the internals of Tamarin were relegated to the dustbins of mercurial. Guess what that means? New posts on how the interpreter works!

Today i'll just be diving back into generally how the interpreter works. Instead of Tamarin executing ABC via Forth, Tamarin now converts the ABC directly into Forth Intermediate Language (IL). David Mandelin also has a few postings about the internals of Tamarin, and how some of these parts interact. One key difference that may not be clear, is that the Forth is not their Intermediate Representation (IR). In traditional compilers, you convert the bytecode/source into some custom IR, do optimizations on the IR, and finally take the IR down to compiled native machine code e.g. x86. Instead, tamarin takes JavaScript ABC, converts it to Forth (IL), and converts Forth into IR during tracing. An analogy would be compiling Java source into .class files (IL) and then taking the Java bytecodes(IL) to another intermediate representation (IR).

As David Mandelin pointed out, the verifier does the actual conversion from ABC to Forth. How do we get to the verifier? The old entry points into the system that were pointed out in previous posts don't apply anymore. So from the beginning of invoking avmplus:

int main(int argc, char *argv[]) {

int _main(int argc, char *argv[]) {
avm vm = shell_init(gc, c);

avm shell_init() {
if(abc_data==NULL) { abc_data = shell_abc_data; length=110658; }
avm _vm = INIT_VM(gc, c, shell, abc_data, length);
INIT_SCRIPT(_vm, 8);

#define INIT_SCRIPT(vm, script_id) \
((avmplus::AbcEnv*)vm)->core()->interp.interpMethodEnv(vm, script_id, -1, 0, NULL)

What we see here is we initialize the shell, which is the abc code located in shell.as. Since we did not pass in a custom .abc file yet, we default to executing the shell. We see that at INIT_SCRIPT, we call interpMethodEnv, which does:

Box::RetType Interpreter::interpMethodEnv {
InterpState state(CODE_POS(w_callin), sp, rp, f);
mode = do_interp(*this, state);
CODE_POS contains Forth words and their definitions. I'll get into that in a another post. We always start interpreting at the Forth word w_callin which does the following:
EXTERN: w_callin ( te args argc env -- result )
pickreceiver1 global_from_scripttraits replacereceiver1
DUP IF callenv THEN DONE ;

: callenv ( xobj xargs argc env -- result )
enterenv ( result )

: enterenv ( obj args argc -- result )
ckint RSTKREM 0<= IF stkover THEN ENV getforthword EXECUTE ;

Here is a previous post which still applies regarding calling native C methods. In the end it calls the following:
FOpcodep FASTCALL getforthword(MethodEnv* env)
// env->core()->console << "getforthword "<<env<<"\n";
return env->getForthWord();
Which finally does the following:
FOpcodep MethodEnv::getForthWord() const
if (flags & ME_NEEDS_VERIFY)
return CODE_POS(w_verifyabc);
Which tells the interpreter go call the Forth word w_verifyabc. I'll get into the details of the verifier translation in another post.