TraceMonkey vs SquirrelFish

With the announcement of TraceMonkey, we saw that SpiderMonkey got a huge improvement. However, in previous benchmarks, SpiderMonkey was still slower than SquirrelFish. So, how does Trace Monkey compare versus SquirrelFish:

Overall, if you aggregate the time for each test, TraceMonkey is about 15% faster. If you do the aggregate speedup, which is the speedup for each test divided by the number of tests, it is about 2.4 x faster. Sometimes TraceMonkey is slower, but give them more than 60 days to hash it out, and I'm sure it'll get faster.


Notes: SquirrelFish and TraceMonkey were with release builds. Tests run on a MacBook Pro 2.4 ghz, 4 gigs of ram, OS X Leopard 10.5.4. TraceMonkey was just pulled about an hour ago. SquirrelFish was last nights SVN.

Good Job Guys. More info on TraceMonkey at Andreas Gal's Blog, Mike Shaver, David Anderson, and for a real visual demo, checkout Mike Schroepfer's blog.

Interpreting the trace tree drawer output

Here's a quick primer on how to interpret the output of the trace tree drawer in Tamarin-Tracing. Let's use the following .as example:

function f() {
var k = 0;
var j = 0;
var l = 0;

for (var i = 0; i < 100; ++i) {
if (i < 20) {
k = i;
}

if (i < 40) {
j = i;
} else
l = i;

}
}

f();


Which generates the following graph:

The blue nodes are the main traces, and the orange blocks are side exits
that were taken during execution of the trace. The numbers such as \4
represent the side exit id. Orange nodes that are labeled with an M eg.
"M2" are merge nodes, which are multiple branches merged into one
compiled piece of code.

The +numbers such as T6...f()+29, at the end of the node labels
represent the Forth IP. Prior to execution, Tamarin translates .abc to
Forth. This resulting Forth is actually executed and traced. You can see
the resulting Forth translation if you run Tamarin with the -Dverbose
flag. In the example, a small translation snippet is:

47:jump 102
+25 LITC1
+26 LBRT 0x47300f2
B0:
* +29 OP_label*
frame: Object? * * * *

The "47:jump 102" represents the ABC opcode from the source file. The
Forth that is being translated for this ABC opcode contains a + next to
the number. Therefore, the main trace in the tree drawer output
T6...f()+29, refers to the OP_label Forth opcode. Each of the +number
next to the nodes in the graph represent their location in the Forth code.

What you are seeing in this picture, is one main trace "T6". The side
exit "\4" has two guards that were actually executed, one of which jumps
back to T6, and the other jumps to another side exit "\13". The other
side exits jump to more side exits in the order that they were compiled,
which eventually hit a merge node "M2". The M2 merge node again side
exits to another merge node M3. M3 finally jumps back to T6.

Another way of looking at the traces is to run Tamarin with the -Dstats
flag, which gives you the same information in text form. It contains a
few more side exits and traces that were unsuccessful in generating
code. The trace tree drawer only shows traces/side exits which contain
compiled code.

* Picture and example used with permission from Zsolt Vilagos and Peter Siket

What's in a Fragment?

Tamarin-Tracing now has trees, which means any piece of code(A fragment) can become attached to any other part of a fragment. Tamarin also uses the context of the tree to optimize a fragment that is attached to the tree. Prior to this patch, every fragment was simply an independent piece of code. This post will deal with the structure of a tree and how each fragment is compiled/relates to other fragments.

Fragmento has a data structure entitled _frags:

Fragmento.h
DWB(FragmentMap*) _frags; /* map from ip -> Fragment ptr */
The DWB is related to the garbage collector. A FragmentMap is really just a sorted hash map. When you iterate over _frags, you get a pointer to a Fragment which is an anchor fragment. An anchor fragment is the first trace that is recorded and becomes the "main" trace where other traces can become attached too. If you look at the visualizer, they are the blue labeled nodes. A fragment is considered compiled if it has code attached to it. A fragment can be kept in the _frags hashmap even if no code is attached to it. This signifies that the trace recorder tried to record something, but failed.

Now lets take a look at the fragment data structure:

Fragmento.h
class Fragment {
LIns* spawnedFrom;
DWB(Fragment*) branches;
TraceKind kind;
NIns* _code;
}

struct SideExit
{
Fragment *target;
verbose_only( uint32_t sid; )
verbose_only(Fragment *from;)
};

Lir.h
class LIns {
SideExit *exit();
}

When a side exit is taken, the recording of a fragment is successful, and the fragment is successfully compiled, the newly established fragment is added to the main traces branches list which is a list of branches attached to the trace. The fragment also has a link of where it spawned from, which returns the SideExit data structure. This side exit contains pointers with target/from fragments. The from points to where the side exit is attached to. The target points to where the fragment ends at, which may be another side exit or the main anchor trace. Finally there is some information regarding what kind of fragment was generated.
Fragmento.h
enum TraceKind {
LoopTrace,
BranchTrace,
MergeTrace
};

The TraceKind signifies if it is a loop, merge, or cross fragment. A loop fragment means that the end of the fragment jumps back to a loop header. A merge fragment means that it merged two fragments into one. Finally, a cross fragment implies that instead of jumping back to its main trace, a fragment instead jumps to another "main" trace. For example, in this post, the fragment \10 Sunspider/bitops-nsieve-bits.js18 is a cross fragment which jumps to T9, even though it originated from T8.

Trace Tree Visualization

I just submitted my first patch to Tamarin-Tracing, w00t! This patch includes some code to visualize compiled trace trees generated by Tamarin. The patch creates a .graphml (an XMLish type format) and uses the program yEd to actually visualize the data. Below is a picture of its prettiness:

This is a picture of what Tamarin compiles for the sunspider test bitops-nsieve-bits. The blue ones are anchor roots, and the \number are side exits. To run this do the following:

1) Build Tamarin-Tracing with Debug
2) run avmplus with -Dtrees and -Ddraw_trees. This only works if you have both flags on.
3) The output is in /path/to/yourFile.abc.graphml
4) Load it in yed

To make it prettier in YED
5) Click on tools->Fit Node To Label
6) Push ok
7) Click on Layout->Hierarchical->Classic
8) Push ok
9) Enjoy!

Reducing the number of guards with Shapes

When Andreas, Michael, and I first started doing trace compilation on JavaScript, we had some interesting unexpected issues. What we discovered with our first implementation, was that there were too many guards during property look ups, and too many guards regarding the type of a specific variable/property. With so many guards, the amount of bailout code increased, which led to in general, an explosion of code size, and a performance hit.

We looked into a way of compressing the type and structure of objects into a single check. The structure of an object is what properties an object has and their corresponding types, not their values. This way, we could remove the type check guards for every property access, and instead replace them with a single check which ensures that an object's properties did not change.

The idea Andreas came up with, was to encapsulate this information into a "Shape", which is a global unique id among all objects in the system. So if you have two different objects which contain the same properties, and all the properties have the same type, they would share the same shape id. How does this help with the guards? How does this affect a trace? You still have issues of type conversions during specific operations, for example with an add. If you want to do integer addition, you still need a check to ensure that the integer doesn't overflow into a double as the semantics of JavaScript say a number is always a double.

Now the question is, where do you insert these checks? This was a bit confusing for a very long time, so please feel free to ask questions if its not clear. As always, its easier to explain if we have an example. The code example we'll use is this:.

var i = 0;
var j = new Object;
j.x = 0;
var temp = 0;

for (i = 0; i < 100000000; i++) {
temp += i;
j.x = temp;
}


Sometime during execution, the var temp will overflow into a double. However, assuming the loop threshold to start a trace is low enough, a trace should have been compiled such that j.x is typed as an integer, temp is an integer, and the generated code does an integer addition. Let's assume now that the current "Shape" for variable j is 10. If a new property is added, or the type of a property changes, the "Shape" would increase to 11. One way of looking at a trace is as follows:

: topOfLoop

Guard to check i < 10000000
Guard to ensure j is shape 10
integer add temp + i

Guard to ensure temp + i is an integer
assign temp to temp + i

Guard to ensure temp is an integer
assign j.x to var temp;

jump topOfLoop


In SSA this would not work since j.x would point directly to the addition result, but let's assume a simple compiler just to make a point. The previous trace works just fine as long as temp stays within the bounds of an INTEGER. It also makes logical sense to always guard on the type of the value being assigned to a given property. If we didn't, you could technically have a case where you assigned some property a value which has a different type than the shape indicated. In this case, the shape identifier of "10" would no longer valid if the property j.x is no longer an integer. So what happens when j.x finally overflows into a double?

The trace would jump into the interpreter, in which case the shape of j would change to "11" since j.x is now a double. Now upon recording, we would insert a guard to ensure that the shape of j is 11, the addition becomes a double with no check since there is nothing else to convert a double to, in terms of adding numbers. The resulting trace would look something like this:

: topOfLoop
Guard to ensure j is shape 11
Guard to check i &lt; 10000000

double add temp + i

assign temp to temp + i
assign j.x to var temp;

jump topOfLoop


However, we could greatly optimize the integer case with the following trace structure:
Guard to ensure j is shape 10

: topOfLoop

Guard to check i < 10000000

integer add temp + i
Guard type to ensure temp + i is an integer

assign temp to temp + i
assign j.x to var temp;

jump topOfLoop

We removed the guard to ensure that j.x was being assigned an integer, and pulled the guard to ensure j was shape 10 outside of the loop. Here is the important part. We no longer need the guard to ensure j.x is being assigned an integer. We had one check outside of the loop which checked j.x is still an integer type. Next, since the value being assigned was ALREADY checked for overflow, we don't need another redundant guard when assigning j.x. If j.x is supposed to overflow, it would be detected by the guard to ensure that the addition result was still an integer. The compiled code would never reach the code to assign the value to j.x. We can carry this idea to any assignment for a property. The type check for a property will always be done by a previous guard at the actual operation time, not the property assignment time.

The most interesting portion, is that SpiderMonkey generally has the same idea, in a slightly different fashion. To get an in depth look at whats going on, there is a 2 page comment in the jsscope.h file of SpiderMonkey. Even though there is an explanation of whats going on, it is difficult to understand, and the code is pretty icky. I'll dissect the actual implementation later on. We also want to implement something like this into Tamarin-Tracing.