Essays

Entries in Tamarin (42)

Tuesday
01Dec2009

Calling C++ in LIR

JIT compiled code needs to call C++ methods to do the heavy lifting. Metadata about C++ methods such as the parameters, the return type, etc is needed to create LIR that can call into those methods. Tamarin does this through the use of the CallInfo data structure:

struct CallInfo
{
    uintptr_t   _address;        // Address of the method
    uint32_t    _argtypes:27;    // 9 3-bit fields indicating arg type
    AbiKind     _abi:3;
 
    verbose_only ( const char* _name; )
};

The _address field is the address of the C++ method.

The _argtypes field is a bit encoding of the number of parameters, the types of each parameter, and the return type. For example, consider a C++ method that has the declaration:

void someFunction(int someParameter, double otherParameter);

 

Void types are represented by the decimal number 0 (binary 000). Integer types are represented by the decimal number 2 (binary 010). Double types are represented by the decimal number 1 (binary 001). The _argtypes, in binary would look like:

        010 | 001 | 000

The parameters are laid out in reverse order, with the return type being the right most 3 bits. Since there are only 27 bits, LIR can only make calls to a C++ method that have at most 8 parameters.

The AbiKind represents how the C++ method is called.

enum AbiKind {
    ABI_FASTCALL,
    ABI_THISCALL,
    ABI_STDCALL,
    ABI_CDECL
};

 

FastCall stores a few parameters into registers instead of pushing them onto the stack. A THISCALL means that the C++ method is part of a class, and requires that the instance of an object be passed in. I haven't seen STDCALL be used at all. A CDECL call stands for C declaration, where all parameters are pushed onto the stack prior to calling the method.

All these CallInfo structures are manually created and maintained in Tamarin in the file core/jit-calls.h. Consider the C++ method declaration:

static void AvmCore::atomWriteBarrier(MMgc::GC *gc, const void *container, 
Atom *address, Atom atomNew);

 

The macro declaration to create the CallInfo structure for atomWriteBarrier is:

FUNCTION(FUNCADDR(AvmCore::atomWriteBarrier), SIG4(V,P,P,P,A), atomWriteBarrier)

 

Each of these funky words is another macro:

  • FUNCTION - AvmCore::atomWriteBarrier uses the ABI_CDECL calling convention.
  • FUNCADDR - This is a static method.
  • SIG4(V,P,P,P,A) - Represents the CallInfo::_argtypes field. V is a void method. The next 3 parameters are (P)ointer types. The last parameter is an (A)tom type.
  • atomWriteBarrier - The name of the method, which is used for debugging purposes.

The current list of C++ methods all manually created and maintained, which is a really annoying hassle. With the LLVM bitcode to LIR translator, the number of CallInfos grows dramatically because C++ methods usually call lots of other C++ methods. Consider the atomWriteBarrier code:

void AvmCore::atomWriteBarrier(MMgc::GC *gc, const void *container, Atom *address, Atom atomNew)
{ 
    decr_atom(*address);
    incr_atom(gc, container, atomNew);
    *address = atomNew;
}

 

There are no CallInfo structures for decr_atom() and incr_atom(), but they have to be created. If a targeted inline C++ method calls a C++ method that doesn't already have a CallInfo structure, a new CallInfo for the newly called C++ method must be created.

While the list can manually be created, it would be horrendously annoying. Instead, we automatically create a new file called jit-calls-generated.h that contains all the new generated CallInfo structures.

Automatically creating a CallInfo structure:

Consider the C++ method declaration for decr_atom:

static void decr_atom(Atom const a);

 

The declaration in LLVMbitcode:

define void @AvmCore::decr_atomEi(i32 %a); 

 

When translating bitcode to LIR, the bitcode contains the parameter and return types. It also provides the name of the method. The last missing piece of the AbiKind.

LLVM bitcode has an explicit FASTCALL modifier for methods that use the FASTCALL calling convention. However, bitcode contains no explicit distinction between a CDECL and THISCALL calling convention. The call site in bitcode only says that a pointer is being passed into a method. The distinction between a CDECL/THISCALL is found at the function definition, NOT declaration.

The LLVM function definition for a THISCALL looks like:

define i32 @Toplevel::add2(%"struct.avmplus::Toplevel"* %this, i32 %leftOperand, i32 %rightOperand); 

 

The first parameter is explicitly named "this". If the first parameter of a function definition is named "this", then the C++ method uses the THISCALL AbiKind. This detection scheme is safe because "this" is a keyword in C++, and you can't name a value "this". The function declaration would show the name of the instance being passed in, rather than the name "this", therefore being incorrect.

What can't be called:

While the majority of C++ methods can be called from LIR, there are some limitations. A C++ object's constructor cannot be called as it is against the C++ specification to get the address of a constructor (C++ Standard, section 12.1.12). At the moment, we create a C++ wrapper that calls the constructor. LIR then calls the wrapper. A polymorphic call cannot be called as it's a runtime lookup based on a virtual method table.

Wednesday
11Nov2009

Example C++ to LIR Translation

There are a lot of translation steps to go from C++ to bitcode to LIR. This post hopefully solidifies everything with a concrete example. Consider ActionScript source that adds two variables and assigns the sum to another variable.

var a;
var b;
var sum = a + b; 

 

The LIR that is normally generated is:

left = load left[0]
right = load right[0]
add = icall #add ( left, right )
 
store vars[32] = add

 

First, the two variables left and right, which are a and b in the AS3 source code, are loaded. Next, a call to the VM C++ method add is generated. The result of add is stored into LIR vars[32], which is a location in memory and represents the AS3 variable sum.

Instead of calling Add, the JIT should inline the method. To do that, C++ has to be converted to LIR. The VM C++ method that does the actual add has the source:

Atom Toplevel::add(Atom left, Atom right)
{
    BITCODE_INLINEABLE  // Indicate that we want to translate this method to LIR
    if (areNumbers(left, right)) {        
return
addNumbers(left, right);
    }
    else {
        return addUnknown(left,right);
    }
}

 

The add method is part of a C++ object named Toplevel. ActionScript values are modeled as Atoms in C++. An Atom is a 32 bit integer with the bottom 3 bits used for type information. The C++ source checks to see what the "+" operator does depending on the types of the two values. Once Tamarin is compiled with llvm, it produces bitcode that looks like:

define i32 @add2(%"struct.avmplus::Toplevel"* %this, i32 %left, i32 %right) {
entry:
    call void @enableBitcodeInlining() // The macro expansion of BITCODE_INLINEABLE
 
    %0 = call i8 @areNumbers(%"struct.avmplus::Toplevel"* %this, i32 %left, i32 %right) 
    %toBool = icmp eq i8 %0, 0      
 
    // if true go to addNumbers, otherwise addUnknown
    br %toBool, label %addNumbers, label %addUnknown    
 
addNumbers:     
    %1 = call i32 @addNumbers(%"struct.avmplus::Toplevel"* %this, i32 %left, i32 %right) 
    ret i32 %1
 
addUnknown:       
    %2 = call i32 @addUnknown(%"struct.avmplus::Toplevel"* %this, i32 %left, i32 %right) 
    ret i32 %2
}

 

The LLVM bitcode is in SSA form and retains all of the type information and control flow. The call to enableBitcodeInlining is the C++ source macro BITCODE_INLINEABLE. The static translator looks for a call to enableBitcodeInlining as an indicator to translate the method to LIR. The llvm type i32 represents an integer, which is what a C++ Atom really is. Finally, the resulting LIR once the C++ add method is inlined into the LIR instead of being called is:

left = load leftp[0]
right = load rightp[0]
 
inline( left, right )
    retVal = stackAlloc 4
    isNumberAdd = icall #areNumbers (left, right)
 
    eq1 = eq isNumberAdd, 0
    jump true eq1 -> addNumbersLabel
    jump false eq1 -> addUnknownLabel
 
addNumbersLabel:
    addNumbers = icall #addNumbers (left, right)
    store retVal[0] = addNumbers
    jump -> endInline 
 
addUnknownLabel:
    addUnknown = icall #addUnknown (left,right)
    store retVal[0] = addUnknown
    jump -> endInline 
 
endInline:
 
ld5 = load retVal[0]
sti vars[32] = ld5

 

The LIR still needs to load the left and right operands prior to inlining the method. Space is then allocated on the stack for the return value of the method. The C++ return statements become LIR store values at the allocated stack location. The C++ call statements remain LIR call statements, unless they too are explicitly inlined. In the original LIR, the value returned from call add was stored into the location vars[32]. Now, the return value is loaded from retVal and stored again into vars[32], completing the inlining of C++ add.

Although not in this example, the translator takes the same approach used by return statements for Phi functions. LIR doesn't have a Phi opcode. Instead, stores are pushed up to the basic block where the value is flowing into the Phi function. The actual Phi turns into a load from the store location.

Also, the translator always follows LLVM semantics and creates both the true and false branches in LIR. Future work is to optimize away one of the branches and add a LIR_phi instruction.

Tuesday
03Nov2009

The Developer Productivity Case for C++ Translation

Why go through all this trouble to translate C++ into LIR? The "easiest" or most direct route is to manually create and inline LIR that represents the functionality we want. Consider adding two values:

if (areNumbers(left, right)) {
    // do integer add
    return sum
}
else {
    // do lots of look ups and checks
    return sum
} 

 

We could create the LIR that represents areNumbers and the x86 integer add, and have it up and running tomorrow. The problem is that it becomes a maintanence hassle. Tamarin would have a bunch of manually inlined LIR snippets that is difficult to understand and debug. The following is the simplified equivalent LIR for the code snippet above:

ld1 = load vars[4]
ld2 = load vars[8]
 
areNumbers1 = call areNumbers(ld1, ld2)
eq1 = eq areNumbers, 0
jump false slowPath
 
   // fall through to fast path
   // do integer add
   store returnValue[0], sum
 
slowPath:
   // more checks
   store returnValue[0], sum
 
ld3 = returnValue[0]
ret ld3

 

It's a lot nicer to just code the functionality in C++, where its a lot more maintainable, malleable, and easier to debug. The counter-argument is that you have to develop the translation program and maintain another piece of code. While that's true, overall, there is less work, and a lot less pain. (Debugging LIR is a painful process). Developer productivity is actually the main reason Adobe's investing in what I consider an "infrastructure" software update.

Friday
30Oct2009

Adding More Cowbell to Tamarin

Many dynamic language virtual machines (VM) written in C++ work by compiling code that passes parameters into VM C++ methods which do most of the heavy lifting. For example, adding two values becomes two x86 pushes onto the stack with a x86 call into a C++ method add. When I last looked at Apple's SquirrelFish, that's all it did. The generated x86 code in Tamarin more or less does the same thing with a few optimizations. And having such a simple compiler works surprisingly well, but it only takes you so far. Sooner or later, you'll want to generate better, faster machine code.

An effective way of generating faster code is method inlining. Instead of generating calls, the JIT should inline the methods that do the real work. The problem for Tamarin's JIT, is that it doesn't know how to inline C++ methods. NanoJIT only compiles LIR, not C++. Enter the C++ to LIR translator.

We do this by statically compiling Tamarin with LLVM. LLVM is a pretty awesome GCC replacement that is nicely designed, object oriented, and generally a pleasure to work with. The output of LLVM is like an object file, but instead contains LLVM bitcode. This new tool then translates LLVM bitcode into LIR. At runtime, Tamarin compiles the LIR, which really represents a VM C++ method. Boom, inlinable C++ methods.

We don't have to apply the translator to only method inlining. In essence, Tamarin compiles VM C++ methods as if they were ActionScript methods. Application ActionScript, such as a youtube video, prior to execution, is translated into LIR, which is then compiled into machine code. We also translate most of Tamarin, written in C++, into LIR, which is then compilable into machine code by NanoJIT. NanoJIT has the power to compile, inline, and optimize VM C++ methods for the performance win.

You may be thinking, hmm this sounds an awfully like a self-interpreting VM. A self-interpreting VM is a VM written in the language it executes. For example, writing a JavaScript VM in JavaScript would make it self-interpreting. Well, you're right. This approach hacks in self-interpreting/JITting into a VM written in C++.

Consider the three images above just to clarify the differences. The "standard" VM approach, and what Tamarin did before, is to JIT code that calls into C++ methods. The host C++ compiler (Visual Studio/GCC) does a lot of work, and the JIT has no idea what was going on. In a self-interpreting VM, the JIT compiles everything, application and VM code, and knows everything about itself. This approach mimics a self-interpreting VM by translating C++ methods into LIR, and JITing both application and VM code.

Over the next few weeks, I'll be detailing the implementation/design decisions in Tamarin.

Thanks to Michael Bebenita for proofreading and creating the images. Checkout this great SNL skit if you are wondering what More Cowbell is.

Sunday
04Oct2009

LIR after the NanoJIT merge

When TraceMonkey was born, the team forked the NanoJIT backend from Tamarin. However, over the summer, the TraceMonkey and Tamarin teams wanted to merge their changes back into a shared repository. The intermediate representation (LIR) changed a bit. What's it look like now?

Here is what a basic LIR instruction looks like:

class LIns
{
        union {
            Reservation lastWord;
            // force sizeof(LIns)==8 and 8-byte alignment on 64-bit machines.
            // this is necessary because sizeof(Reservation)==4 and we want all
            // instances of LIns to be pointer-aligned.
            void* dummy;
        };
}

 
What about Reservation?

// The opcode is not logically part of the Reservation, but we include it
// in this struct to ensure that opcode plus the Reservation fits in a
// single word. 
struct Reservation
{
        uint32_t arIndex:16;    // index into stack frame.  displ is -4*arIndex
        Register reg:7;         // register UnknownReg implies not in register
        uint32_t used:1;        // when set, the reservation is active
        LOpcode  opcode:8;
}

A LIR instruction is a padding around a 32 bit Reservation which contains the opcode. But where are the operands to an instruction?

This is the biggest difference after the merge, at least from Tamarin's perspective. Prior to the nanoJIT merge, LIR instructions were inserted into a contiguous chunk of memory. Each LIR instruction was one 32 bit word. The top 8 bits were reserved for the opcode while the lower 24 bits were used as operands. Each operand was represented as an 8 bit offset from the point in memory. The actual LIR instruction structure had no notion of pointers.

Now LIR instructions directly point to their operands. But where are the operands?. There are multiple LIR instruction types depending on the number of operands an instruction requires. For example, return instructions only need to point to the value they are returning. NanoJIT has a LInspOp1 class for instructions that have only one operand:

// 1-operand form.  Used for LIR_ret, unary arithmetic/logic ops,
class LInsOp1
{
        friend class LIns;
        LIns*       oprnd_1;
        LIns        ins;
}

 

"Ins" here points to "this" instruction. NanoJIT also has LInsOp2 for instructions with two operands:

// 2-operand form.  Used for loads, guards, branches, comparisons, binary
class LInsOp2
{
        LIns*       oprnd_2;
        LIns*       oprnd_1;
        LIns        ins;
};

 

And one last one for instructions that have three operands. This means LIR instructions are variable length and can be up to 4 words in length. Now what about constant values such as the number 6? NanoJIT has a few other specialized LIR instructions such as LInsI:

// Used for LIR_int and LIR_ialloc.
class LInsI
{
        int32_t     imm32;
        LIns        ins;
};

 

If you want to get into the gritty details, checkout the NanoJIT merge MDC article. If you want to see how it is all implemented, checkout LIR.h on the mercurial repository on line 210 for a nice comment.