Calling C++ in LIR

JIT compiled code needs to call C++ methods to do the heavy lifting. Metadata about C++ methods such as the parameters, the return type, etc is needed to create LIR that can call into those methods. Tamarin does this through the use of the CallInfo data structure:

struct CallInfo
{
    uintptr_t   _address;        // Address of the method
    uint32_t    _argtypes:27;    // 9 3-bit fields indicating arg type
    AbiKind     _abi:3;
 
    verbose_only ( const char* _name; )
};

The _address field is the address of the C++ method.

The _argtypes field is a bit encoding of the number of parameters, the types of each parameter, and the return type. For example, consider a C++ method that has the declaration:

void someFunction(int someParameter, double otherParameter);

 

Void types are represented by the decimal number 0 (binary 000). Integer types are represented by the decimal number 2 (binary 010). Double types are represented by the decimal number 1 (binary 001). The _argtypes, in binary would look like:

        010 | 001 | 000

The parameters are laid out in reverse order, with the return type being the right most 3 bits. Since there are only 27 bits, LIR can only make calls to a C++ method that have at most 8 parameters.

The AbiKind represents how the C++ method is called.

enum AbiKind {
    ABI_FASTCALL,
    ABI_THISCALL,
    ABI_STDCALL,
    ABI_CDECL
};

 

FastCall stores a few parameters into registers instead of pushing them onto the stack. A THISCALL means that the C++ method is part of a class, and requires that the instance of an object be passed in. I haven't seen STDCALL be used at all. A CDECL call stands for C declaration, where all parameters are pushed onto the stack prior to calling the method.

All these CallInfo structures are manually created and maintained in Tamarin in the file core/jit-calls.h. Consider the C++ method declaration:

static void AvmCore::atomWriteBarrier(MMgc::GC *gc, const void *container, 
Atom *address, Atom atomNew);

 

The macro declaration to create the CallInfo structure for atomWriteBarrier is:

FUNCTION(FUNCADDR(AvmCore::atomWriteBarrier), SIG4(V,P,P,P,A), atomWriteBarrier)

 

Each of these funky words is another macro:

  • FUNCTION - AvmCore::atomWriteBarrier uses the ABI_CDECL calling convention.
  • FUNCADDR - This is a static method.
  • SIG4(V,P,P,P,A) - Represents the CallInfo::_argtypes field. V is a void method. The next 3 parameters are (P)ointer types. The last parameter is an (A)tom type.
  • atomWriteBarrier - The name of the method, which is used for debugging purposes.

The current list of C++ methods all manually created and maintained, which is a really annoying hassle. With the LLVM bitcode to LIR translator, the number of CallInfos grows dramatically because C++ methods usually call lots of other C++ methods. Consider the atomWriteBarrier code:

void AvmCore::atomWriteBarrier(MMgc::GC *gc, const void *container, Atom *address, Atom atomNew)
{ 
    decr_atom(*address);
    incr_atom(gc, container, atomNew);
    *address = atomNew;
}

 

There are no CallInfo structures for decr_atom() and incr_atom(), but they have to be created. If a targeted inline C++ method calls a C++ method that doesn't already have a CallInfo structure, a new CallInfo for the newly called C++ method must be created.

While the list can manually be created, it would be horrendously annoying. Instead, we automatically create a new file called jit-calls-generated.h that contains all the new generated CallInfo structures.

Automatically creating a CallInfo structure:

Consider the C++ method declaration for decr_atom:

static void decr_atom(Atom const a);

 

The declaration in LLVMbitcode:

define void @AvmCore::decr_atomEi(i32 %a); 

 

When translating bitcode to LIR, the bitcode contains the parameter and return types. It also provides the name of the method. The last missing piece of the AbiKind.

LLVM bitcode has an explicit FASTCALL modifier for methods that use the FASTCALL calling convention. However, bitcode contains no explicit distinction between a CDECL and THISCALL calling convention. The call site in bitcode only says that a pointer is being passed into a method. The distinction between a CDECL/THISCALL is found at the function definition, NOT declaration.

The LLVM function definition for a THISCALL looks like:

define i32 @Toplevel::add2(%"struct.avmplus::Toplevel"* %this, i32 %leftOperand, i32 %rightOperand); 

 

The first parameter is explicitly named "this". If the first parameter of a function definition is named "this", then the C++ method uses the THISCALL AbiKind. This detection scheme is safe because "this" is a keyword in C++, and you can't name a value "this". The function declaration would show the name of the instance being passed in, rather than the name "this", therefore being incorrect.

What can't be called:

While the majority of C++ methods can be called from LIR, there are some limitations. A C++ object's constructor cannot be called as it is against the C++ specification to get the address of a constructor (C++ Standard, section 12.1.12). At the moment, we create a C++ wrapper that calls the constructor. LIR then calls the wrapper. A polymorphic call cannot be called as it's a runtime lookup based on a virtual method table.