SquirrelFish and SpiderMonkey play with SunSpider

How many awesome names can we have? LOL I love these hybrid animals. A comment on Slashdot says its like the new Web 2.0 naming scheme. Reminds me of SpiderPig! Anyway, the Webkit guys released their new JavaScript Interpreter, squirrelfish. The benchmarks show that it is pretty fast. Cameron has a post detailing the overall picture, but what about the micro picture? The following is a graph of each SunSpider test run on SquirrelFish(WebKit-r34342) and SpiderMonkey 1.8 (built on May 22 2008 at 13:31:57). The shorter the bar the better.


If you wish to rerun these tests, heres how to build WebKit. If you specifically want to build just SquirrelFish, use the following command:

1) WebKitTools/Scripts/build-testkjs
2) The binary is /WebKitBuild/Release/testkjs

Props to the webkit guys.

* Edit as Mark mentioned. All tests run on OS X 10.5.3 on a Macbook Pro 2.4 ghz w/ 2 gigs of ram. Both interpreters were built on my machine with optimizations/release build.

SpiderMonkey's secret object sauce

Well not really secret since SpiderMonkey is open source, but I tried to think of a cool title. A lot of the benchmarks show that SpiderMonkey is significantly faster than Tamarin. This has been mostly attributed to the way SpiderMonkey accesses object properties. SpiderMonkey has an implementation of the shape idea. A shape is a unique identifier which details the structure of an object, and allows for a quick lookup of a given property. I think it will be easier to explain shapes by showing how SpiderMonkey uses them. When SpiderMonkey looks up a property, for example with the opcode JSOP_GETPROP, it does the following:

jsinterp.c
do_getprop_with_obj:
do {
JSPropCacheEntry *entry;

if (JS_LIKELY(obj->map->ops->getProperty == js_GetProperty)) {
PROPERTY_CACHE_TEST(cx, regs.pc, obj, obj2, entry, atom);

The macro PROPERTY_CACHE_TEST expands into a large chunk of code, but the important line is here:
jsinterp.c  
JSPropertyCache *cache_ = &JS_PROPERTY_CACHE(cx); \
uint32 kshape_ = (JS_ASSERT(OBJ_IS_NATIVE(obj)), \
OBJ_SCOPE(obj)->shape); \
entry = &cache_->table[PROPERTY_CACHE_HASH_PC(pc, kshape_)]; \
Therefore, properties and the shape, are attached to a Scope. Hence global variables are attached to the global scope, and object properties are attached to the object instance's scope. The PROPERTY_CACHE_HASH_PC expands to:
jsinterp.h
#define PROPERTY_CACHE_HASH(pc,kshape) \
((((jsuword)(pc) >> PROPERTY_CACHE_LOG2) ^ (jsuword)(pc) ^ (kshape)) & \
PROPERTY_CACHE_MASK)

#define PROPERTY_CACHE_HASH_PC(pc,kshape) \
PROPERTY_CACHE_HASH(pc, kshape)

With a few bit operations, given an object's shape and the current program location, we can find a property's address. However, what happens if we have a property cache miss? SpiderMonkey tries to use another combination which includes the object instance of the property. If the quick property cache fails we see the following:
jsinterp.c
#define PROPERTY_CACHE_TEST
// Test quick access

// if property not found
atom = js_FullTestPropertyCache(cx, pc, &obj, &pobj, &entry); \
if (atom) \
PCMETER(cache_->misses++);

jsinterp.c
js_FullTestPropertyCache
entry = &JS_PROPERTY_CACHE(cx).table[PROPERTY_CACHE_HASH_ATOM(atom, obj, NULL)];

#define PROPERTY_CACHE_HASH_ATOM(atom,obj,pobj) \
PROPERTY_CACHE_HASH((jsuword)(atom) >> 2, OBJ_SCOPE(obj)->shape)

If the property cannot be found, SpiderMonkey does a full property lookup in the property tree. This includes going up through the prototype chain to find the property. After enough local properties have been added, the object property lookup simply becomes a hash lookup. In quick summary, based on the current program counter and scope shape, we look for a given property. If it is not found, we include the object instance the property should be in. Finally, if nothing is found, we do a full property look up.

Now the only question is, how do we determine what to put in the cache, and when? The function that fills the property cache is in jsinterp.c in function js_FillPropertyCache. This function will take a given shape and allocate some space in the property cache associated with the given shape. js_FillPropertyCache is called upon the setting/getting of a property when the property could not be found in the cache, and a full property lookup occurred.

Reducing the number of guards with Shapes

When Andreas, Michael, and I first started doing trace compilation on JavaScript, we had some interesting unexpected issues. What we discovered with our first implementation, was that there were too many guards during property look ups, and too many guards regarding the type of a specific variable/property. With so many guards, the amount of bailout code increased, which led to in general, an explosion of code size, and a performance hit.

We looked into a way of compressing the type and structure of objects into a single check. The structure of an object is what properties an object has and their corresponding types, not their values. This way, we could remove the type check guards for every property access, and instead replace them with a single check which ensures that an object's properties did not change.

The idea Andreas came up with, was to encapsulate this information into a "Shape", which is a global unique id among all objects in the system. So if you have two different objects which contain the same properties, and all the properties have the same type, they would share the same shape id. How does this help with the guards? How does this affect a trace? You still have issues of type conversions during specific operations, for example with an add. If you want to do integer addition, you still need a check to ensure that the integer doesn't overflow into a double as the semantics of JavaScript say a number is always a double.

Now the question is, where do you insert these checks? This was a bit confusing for a very long time, so please feel free to ask questions if its not clear. As always, its easier to explain if we have an example. The code example we'll use is this:.

var i = 0;
var j = new Object;
j.x = 0;
var temp = 0;

for (i = 0; i < 100000000; i++) {
temp += i;
j.x = temp;
}


Sometime during execution, the var temp will overflow into a double. However, assuming the loop threshold to start a trace is low enough, a trace should have been compiled such that j.x is typed as an integer, temp is an integer, and the generated code does an integer addition. Let's assume now that the current "Shape" for variable j is 10. If a new property is added, or the type of a property changes, the "Shape" would increase to 11. One way of looking at a trace is as follows:

: topOfLoop

Guard to check i < 10000000
Guard to ensure j is shape 10
integer add temp + i

Guard to ensure temp + i is an integer
assign temp to temp + i

Guard to ensure temp is an integer
assign j.x to var temp;

jump topOfLoop


In SSA this would not work since j.x would point directly to the addition result, but let's assume a simple compiler just to make a point. The previous trace works just fine as long as temp stays within the bounds of an INTEGER. It also makes logical sense to always guard on the type of the value being assigned to a given property. If we didn't, you could technically have a case where you assigned some property a value which has a different type than the shape indicated. In this case, the shape identifier of "10" would no longer valid if the property j.x is no longer an integer. So what happens when j.x finally overflows into a double?

The trace would jump into the interpreter, in which case the shape of j would change to "11" since j.x is now a double. Now upon recording, we would insert a guard to ensure that the shape of j is 11, the addition becomes a double with no check since there is nothing else to convert a double to, in terms of adding numbers. The resulting trace would look something like this:

: topOfLoop
Guard to ensure j is shape 11
Guard to check i &lt; 10000000

double add temp + i

assign temp to temp + i
assign j.x to var temp;

jump topOfLoop


However, we could greatly optimize the integer case with the following trace structure:
Guard to ensure j is shape 10

: topOfLoop

Guard to check i < 10000000

integer add temp + i
Guard type to ensure temp + i is an integer

assign temp to temp + i
assign j.x to var temp;

jump topOfLoop

We removed the guard to ensure that j.x was being assigned an integer, and pulled the guard to ensure j was shape 10 outside of the loop. Here is the important part. We no longer need the guard to ensure j.x is being assigned an integer. We had one check outside of the loop which checked j.x is still an integer type. Next, since the value being assigned was ALREADY checked for overflow, we don't need another redundant guard when assigning j.x. If j.x is supposed to overflow, it would be detected by the guard to ensure that the addition result was still an integer. The compiled code would never reach the code to assign the value to j.x. We can carry this idea to any assignment for a property. The type check for a property will always be done by a previous guard at the actual operation time, not the property assignment time.

The most interesting portion, is that SpiderMonkey generally has the same idea, in a slightly different fashion. To get an in depth look at whats going on, there is a 2 page comment in the jsscope.h file of SpiderMonkey. Even though there is an explanation of whats going on, it is difficult to understand, and the code is pretty icky. I'll dissect the actual implementation later on. We also want to implement something like this into Tamarin-Tracing.

A Better View of Tamarin and SpiderMonkey's Interpreter

There was some discussion this morning about whether or not it was possible to get Tamarin's Interpreter up to speed with SpiderMonkey, and if so, how far is Tamarin? Instead of using JavaGrande, we decided to use SunSpider, since it has some tests which use Objects. If anyone's interested, you can only get SunSpider from their source repository. All benchmarks were done on a MacBook pro 2.4 ghz with 2 gigs of ram. Both were built as Release/Optimizations enabled.

Note that with Tamarin, the -interp flag has to be set to test the interpreter only. I couldn't find a way to run it through the sunspider test harness. So I wrote my own bash script which invoked Tamarin 10 times, and used the unix 'time' results. Also, some of these tests failed.

The picture below is how much slower Tamarin is compared to SpiderMonkey. The smaller the bar, the faster Tamarin executed the test. The graphic only shows tests which passed. There was no test in which Tamarin was faster than SpiderMonkey, therefore this pic only shows how much slower Tamarin was than SpiderMonkey. Below the pics are the actual numbers.


SpiderMonkey 1.8
built on May 22 2008 at 13:31:57

3d: 347.4ms +/- 0.4%
cube: 126.6ms +/- 0.5%
morph: 118.0ms +/- 0.4%
raytrace: 102.8ms +/- 0.4%

access: 415.5ms +/- 1.5%
binary-trees: 43.3ms +/- 4.0%
fannkuch: 193.3ms +/- 1.4%
nbody: 125.1ms +/- 1.4%
nsieve: 53.8ms +/- 0.6%

bitops: 397.8ms +/- 0.4%
3bit-bits-in-byte: 49.6ms +/- 1.0%
bits-in-byte: 79.1ms +/- 0.3%
bitwise-and: 178.3ms +/- 0.4%
nsieve-bits: 90.8ms +/- 0.7%

controlflow: 35.1ms +/- 0.6%
recursive: 35.1ms +/- 0.6%

crypto: 169.9ms +/- 0.3%
aes: 66.1ms +/- 0.3%
md5: 51.9ms +/- 0.8%
sha1: 51.9ms +/- 0.4%

date: 269.3ms +/- 0.4%
format-tofte: 160.3ms +/- 0.3%
format-xparb: 109.0ms +/- 0.7%

math: 288.5ms +/- 0.3%
cordic: 128.9ms +/- 0.4%
partial-sums: 102.0ms +/- 0.3%
spectral-norm: 57.6ms +/- 0.6%

regexp: 251.9ms +/- 0.2%
dna: 251.9ms +/- 0.2%

string: 594.0ms +/- 0.3%
base64: 68.7ms +/- 0.5%
fasta: 148.7ms +/- 0.2%
tagcloud: 134.2ms +/- 0.8%
unpack-code: 169.4ms +/- 0.2%
validate-input: 73.0ms +/- 0.7%

Tamarin:
3d: 2035 ms
cube: 457 ms
morph: 608 ms
raytrace: 970 ms

access: 3548 ms
binary-trees: 363 ms
fannkuch: 1819 ms
nbody: 1047 ms
nsieve: 319 ms

bitops: 3929 ms
3bit-bits-in-byte: 275 ms
bits-in-byte: 371 ms
bitwise-and: 2454 ms
nsieve-bits: 829 ms

controlflow: 282 ms
recursive: 282 ms

crypto: 1440 ms
aes: 785 ms
md5: 322 ms
sha1: 333 ms

date: Fail
format-tofte: Reference Error #1065
format-xparb: Reference Error #1065

math: 2637 ms
cordic: 616 ms
partial-sum 1584 ms
spectral-norm: 437 ms

regexp: Fail
dna: ArgumentError: Error #1063

string: 56445 ms
base64: ReferenceError: Error #1069
fasta: 837 ms
tagcloud: ReferenceError: #1065
unpack-code: 50234 ms
validate-input: 5374 ms

Tamarin is slower. The worst part has to be playing with strings. Ouch.

Tamarin Benchmarks - Now Featuring Type Annotations

Andreas pointed out that the for loop example might be a bit more interesting if we add type annotations to the source. So let's take another look at Tamarin with added type annotations. The new found source:

package{
var j:Number = 0;
for (var i:Number = 0; i < 100000000; i++) {
j++;
}
}
Here are the numbers:
Tamarin-Central Compiler
time ./shell -Dforcemir ~/Projects/tamarin-central/test/custom/forLoop.abc

real 0m0.452s
user 0m0.445s
sys 0m0.007s

Tamarin-Tracing Single Interpreter - Compiler on
time ./avmplus ~/Projects/tamarin-central/test/custom/forLoop.abc

real 0m5.006s
user 0m4.923s
sys 0m0.041s

Tamarin-Tracing Double Interpreter - Compiler on
time ./avmplus ~/Projects/tamarin-central/test/custom/forLoop.abc

real 0m6.529s
user 0m6.496s
sys 0m0.033s

SpiderMonkey 1.8 (Copied from previous post for convenience)
time ./js ~/Projects/tamarin-central/test/custom/forLoop.js

real 0m6.396s
user 0m6.385s
sys 0m0.011s

Even with type annotations, Tamarin squeaks by SpiderMonkey 1.8. Remember, the previous test was untyped code.

Finally, what if we change the type from Number to integer. The new source code:

package{
var j:int = 0;
for (var i:int = 0; i < 100000000; i++) {
j++;
}
}
And the results:
Tamarin-Tracing Single Interpreter - Compiler on
time ./avmplus ~/Projects/tamarin-central/test/custom/forLoop.abc

real 0m2.384s
user 0m2.331s
sys 0m0.034s

Tamarin-Tracing Double Interpreter - Compiler on
time ./avmplus ~/Projects/tamarin-central/test/custom/forLoop.abc

real 0m1.301s
user 0m1.273s
sys 0m0.027s

Tamarin-Central - Compiler on
time ./shell -Dforcemir ~/Projects/tamarin-central/test/custom/forLoop.abc

real 0m0.373s
user 0m0.362s
sys 0m0.007s

The resulting speed boost from using an integer type is quite nice.

*Added the flag -Dforcemir to tamarin-central to force compilation. Thx for the tip Rick.