Creating LIR for Fun and Profit

Low-level Intermediate Representation (LIR) is the IR thats fiddled with and optimized on in Tamarin. Tamarin takes in ABC, converts it to LIR, does all the compilation optimizations on LIR, then feeds LIR into an assembler where it is finally generated into machine code. All the cool stuff happens in LIR. If you want to make Tamarin faster, you need to work with the LIR. Thus it should be important to learn how to use and generate LIR if you want to speed up Tamarin.

Let's start with the basics: x = a + b. We want to build some LIR that adds two numbers.

// load var a. lhs = left hand side
LIns* lhs = loadAtomRep(sp-1);

Remember that ABC is a stack based machine. This line creates a new LIR instruction that represents the Atom one item below the top of the stack. LIR is in Single Static Assignment form and only works on Atoms. Atoms are the internal representation of data items in Tamarin. The bottom 3 bits are used for type tags and the top 29 are used for data:

<---- Data Bits here ----------> <Type>
aaaaaaaa bbbbbbbb cccccccc ddddd zzz

Eg the number 10 as an integer: // 6 represents an integer
00000000 00000000 00000000 01010 101

Now the next LIR:

// Same thing but top of stack
LIns* rhs = loadAtomRep(sp);

// TopLevel contains a bunch of helper functions that are too complex for x86
LIns* toplevel = loadToplevel();

Nothing too big here, same thing as loading the lhs. The TopLevel is in TopLevel.cpp and contains a bunch of helper functions that are called in the compiled x86. They are mostly functions that are too annoying or complex to build in actual x86.

// Call the function TopLevel::add2 with parameters lhs and rhs
LIns* out = callIns(FUNCTIONID(add2), 3, toplevel, lhs, rhs);

This calls the method TopLevel::add2(lhs, rhs). Add2() does all the type checks of lhs and rhs and finally adds the two numbers. What's nice about this is that you don't even have to generate any LIR to sort out parameters. The assembler will figure everything out, move variables and parameters into the right place and generate the correct x86. The LIns* out instruction points to the result of add2. Finally we want to write the result back to the stack:

// store the result in memory
localSet(sp-1, atomToNativeRep(result, out), result);

The add2() method returns an Atom. atomToNativeRep() converts Atoms to their known types if they have been assigned one. The LocalSet function creates the necessary instructions to store the resulting atom to memory. The final sequence of instructions actually used by Tamarin to add two numbers is:

LIns* lhs = loadAtomRep(sp-1);
LIns* rhs = loadAtomRep(sp);
LIns* toplevel = loadToplevel();
LIns* out = callIns(FUNCTIONID(add2), 3, toplevel, lhs, rhs);
localSet(sp-1, atomToNativeRep(result, out), result)

Building Branches

It would be a shame if we couldn't branch, but generating LIR to branch is pretty easy! Let's go back to the add example and say build a piece of x86 code that does the following:

if (lhs == int && rhs == int) {
int add
jump to LABEL
else {
call add2()


Let's focus on building only the check to make sure lhs == int and creating the jump to LABEL. We can check that lhs is really an integer by reading the type tag of the bottom 3 bits. If (lhs & 6 == 6) is true, then the atom is an integer.

// Check that the lhs is an integer.
// InsConst(kIntegerType) creates a constant integer 6
// atom & 6 gives us if this is an int.
LIns *lhs_type = binaryIns(LIR_and, lhs, InsConst(kIntegerType));

// check to see if it is equal to an integer
LIns *lhs_is_int = binaryIns(LIR_eq, lhs_type, InsConst(kIntegerType));

// if not, jump to the normal add2. Notice no label reference.
LIns *lhs_not_integer = branchIns(LIR_jf, lhs_is_int);

// Make sure rhs is an integer
// do an integer add
// jump somewhere else

// tell the lhs_not_inger branch instruction to jump to this location
// Create a LIR_label instruction and point previous branch instructions
// to use this label
LIns *genericAddLabel = Ins(LIR_label);

// Tell lhs_not_inger to point to this new address

// Generate the add2() method call

Tada, like magic!. The backend will find all the labels and generate the appropriate jump. You can just generate LIR, branch to wherever you want, and Tamarin figures out the rest. It's actually quite nice that the backend does all the register allocation and figures everything out for method calls and parameters.