Understanding Tamarin's Debug Output

The debug output from Tamarin can be quite a dizzying mix of ABC, LIR, and x86, especially if you are trying to learn all 3 at once. Hopefully this post will guide you through the clutter so that the debug output may be a useful tool. Prior to executing an actual method, Tamarin verifies the input ABC. Then ABC is converted to LIR during verification. The LIR goes through a few optimization passes, at which point it finally is assembled into ye old x86. The whole compilation pipeline is:

  • // Notation: original -> modifier -> output
  • ActionScript 3 Source -> Flex -> ABC
  • ABC -> verifier -> LIR
  • LIR -> optimizations -> optimized LIR
  • optimized Lir -> Assembler -> x86
Each one of these stages is blurred in the debug output, and some stages must be read in it's own special lingo. Consider the following AS3 source:

function printLoop(limit:int):void {
var i:int;
var sum:int;

for (i = 0; i < limit; i++) {
sum += i;
}

print(sum);
}

printLoop(10000);


Analyzing the debug spew from the -Dverbose flag, the beginning/end stages are:

ABC->verifier->LIR

15:callproperty {public,function_loop.as$1}::printLoop 1 // call the printLoop(10000)
verify Function-0/<anonymous>()
start
// Tamarin uses the FASTCALL which means some params are in registers
ebx = iparam 0 ebx
esi = iparam 1 esi
edi = iparam 2 edi
env = iparam 0 ecx // MethodEnv env
argc = iparam 1 edx // number of args
ap = iparam 2 // pointer to allocated space on stack for temps
vars = ialloc 48 // allocate on stack 48 / 8 bytes = 6 vars


What follows is a one-to-one translation of ABC to LIR. Let's zoom in on the AS3 sum += i line:

ABC -> LIR
15:getlocal3 // ABC
ld1 = ld vars[24] // LIR
sti vars[32] = ld1 // LIR

// The state of the AS3 VM
stack: int@ld1
scope: [global]
locals: Object?@ldc1 int@ldc2 int@0 int@ld1

16:getlocal2
ld2 = ld vars[16]
sti vars[40] = ld2

stack: int@ld1 int@ld2
scope: [global]
locals: Object?@ldc1 int@ldc2 int@ld2 int@ld1

17:add
i2f1 = i2f ld1 // notice we convert to float prior to addition
stqi vars[32] = i2f1
i2f2 = i2f ld2
stqi vars[40] = i2f2
fadd1 = fadd i2f1, i2f2 // and do an FADD instead of integer add. Both i and sum are declared as ints
stqi vars[32] = fadd1


This is the initial translation of LIR. Next we need to look at the optimization passes which begins when the line "killing dead stores after ... " occurs:

LIR Optimization
killing dead stores after 2 LA iterations.
live 4
live #core
live vars
live env
ret 4
- sti vars[32] = add2 // kill
add2 = add ld2, 1
1
- sti vars[32] = ld2
sti vars[24] = add1
- sti vars[32] = add1
add1 = add ld1, ld2
- stqi vars[32] = fadd1
fadd1 = fadd i2f1, i2f2 // still floating point add
- stqi vars[40] = i2f2
i2f2 = i2f ld2 // convert int to float
40
- stqi vars[32] = i2f1
i2f1 = i2f ld1
32
- sti vars[40] = ld2
ld2 = ld vars[16] // up
16 // way
- sti vars[32] = ld1 // your
ld1 = ld vars[24] // working
// Start reading here

A "-" next to a store indicates that this LIR instruction is dead and doesn't need to be compiled. Another thing to note is that this iteration requires you to read from the bottom up. The first instruction that is actually going to be compiled is at the end of the stream. This is because nanojit compiles in reverse order due to some tracing compilation characteristics. Finally, we have an optimized LIR stream when we see something like "live instruction count 58, total 75, max pressure 7". The optimized LIR can be read in regular top-down fashion.

Optimized LIR  
B14:
24
ld1 = ld vars[24]
16
ld2 = ld vars[16]
add1 = add ld1, ld2 // the adds are now integer adds!
sti vars[24] = add1
1
add2 = add ld2, 1
sti vars[16] = add2

Once we have the optimized LIR, we can start looking at the generated x86 which begins with the [prologue] indicator:

Begin x86
00C47F28 [prologue]
nop
nop
nop
push ebp
mov ebp,esp
00C47F2E [patch entry]
sub esp,88

Where we see a weaved sequence of LIR instructions followed by the x86 generated for it. It seems as if the delimitation for each sequence is based on the basic blocks in AS3.

   B14: // Basic block ID
ld1 = ld vars[24] // Optmized LIR
ld2 = ld vars[16]
add1 = add ld1, ld2
sti vars[24] = add1
add2 = add ld2, 1
sti vars[16] = add2

// Compiled X86
00C47F79 [B14]
mov esi,-32(ebp)
mov ebx,-40(ebp) esi(ld1) // ld1 = LIR ld1
add esi,ebx ebx(ld2) esi(ld1) // an actual integer add
mov -32(ebp),esi ebx(ld2) esi(add1)
add ebx,1 ebx(ld2)
mov -40(ebp),ebx ebx(add2)


We're all done!