New SquirrelFish Extreme

A fresh out of the oven Squirrelfish called SquirrelFish Extreme was announced here. They attribute their speedup to bytecode optimizations, Context Threading, a property cache, and a regular expression JIT.

I tried a context threaded interpreter in Tamarin-Tracing, which significantly sped up the interpreter, but jumping from a trace back into the interpreter was too expensive. David Mandelin has been looking at adding inline threading/context threading into SpiderMonkey. Something like a property cache is in SpiderMonkey and has given SpiderMonkey significant performance improvements. Finally, we were thinking about doing a regular expression tracing JIT, but have yet to implement it.

Cameron Zwarich already has some performance benchmarks of SFX (SquirrelFish Extreme) to TraceMonkey and Google's V8. Browser war round 2! FIGHT!

TraceMonkey vs SquirrelFish

With the announcement of TraceMonkey, we saw that SpiderMonkey got a huge improvement. However, in previous benchmarks, SpiderMonkey was still slower than SquirrelFish. So, how does Trace Monkey compare versus SquirrelFish:

Overall, if you aggregate the time for each test, TraceMonkey is about 15% faster. If you do the aggregate speedup, which is the speedup for each test divided by the number of tests, it is about 2.4 x faster. Sometimes TraceMonkey is slower, but give them more than 60 days to hash it out, and I'm sure it'll get faster.


Notes: SquirrelFish and TraceMonkey were with release builds. Tests run on a MacBook Pro 2.4 ghz, 4 gigs of ram, OS X Leopard 10.5.4. TraceMonkey was just pulled about an hour ago. SquirrelFish was last nights SVN.

Good Job Guys. More info on TraceMonkey at Andreas Gal's Blog, Mike Shaver, David Anderson, and for a real visual demo, checkout Mike Schroepfer's blog.

SquirrelFish and SpiderMonkey play with SunSpider

How many awesome names can we have? LOL I love these hybrid animals. A comment on Slashdot says its like the new Web 2.0 naming scheme. Reminds me of SpiderPig! Anyway, the Webkit guys released their new JavaScript Interpreter, squirrelfish. The benchmarks show that it is pretty fast. Cameron has a post detailing the overall picture, but what about the micro picture? The following is a graph of each SunSpider test run on SquirrelFish(WebKit-r34342) and SpiderMonkey 1.8 (built on May 22 2008 at 13:31:57). The shorter the bar the better.


If you wish to rerun these tests, heres how to build WebKit. If you specifically want to build just SquirrelFish, use the following command:

1) WebKitTools/Scripts/build-testkjs
2) The binary is /WebKitBuild/Release/testkjs

Props to the webkit guys.

* Edit as Mark mentioned. All tests run on OS X 10.5.3 on a Macbook Pro 2.4 ghz w/ 2 gigs of ram. Both interpreters were built on my machine with optimizations/release build.

SpiderMonkey's secret object sauce

Well not really secret since SpiderMonkey is open source, but I tried to think of a cool title. A lot of the benchmarks show that SpiderMonkey is significantly faster than Tamarin. This has been mostly attributed to the way SpiderMonkey accesses object properties. SpiderMonkey has an implementation of the shape idea. A shape is a unique identifier which details the structure of an object, and allows for a quick lookup of a given property. I think it will be easier to explain shapes by showing how SpiderMonkey uses them. When SpiderMonkey looks up a property, for example with the opcode JSOP_GETPROP, it does the following:

jsinterp.c
do_getprop_with_obj:
do {
JSPropCacheEntry *entry;

if (JS_LIKELY(obj->map->ops->getProperty == js_GetProperty)) {
PROPERTY_CACHE_TEST(cx, regs.pc, obj, obj2, entry, atom);

The macro PROPERTY_CACHE_TEST expands into a large chunk of code, but the important line is here:
jsinterp.c  
JSPropertyCache *cache_ = &JS_PROPERTY_CACHE(cx); \
uint32 kshape_ = (JS_ASSERT(OBJ_IS_NATIVE(obj)), \
OBJ_SCOPE(obj)->shape); \
entry = &cache_->table[PROPERTY_CACHE_HASH_PC(pc, kshape_)]; \
Therefore, properties and the shape, are attached to a Scope. Hence global variables are attached to the global scope, and object properties are attached to the object instance's scope. The PROPERTY_CACHE_HASH_PC expands to:
jsinterp.h
#define PROPERTY_CACHE_HASH(pc,kshape) \
((((jsuword)(pc) >> PROPERTY_CACHE_LOG2) ^ (jsuword)(pc) ^ (kshape)) & \
PROPERTY_CACHE_MASK)

#define PROPERTY_CACHE_HASH_PC(pc,kshape) \
PROPERTY_CACHE_HASH(pc, kshape)

With a few bit operations, given an object's shape and the current program location, we can find a property's address. However, what happens if we have a property cache miss? SpiderMonkey tries to use another combination which includes the object instance of the property. If the quick property cache fails we see the following:
jsinterp.c
#define PROPERTY_CACHE_TEST
// Test quick access

// if property not found
atom = js_FullTestPropertyCache(cx, pc, &obj, &pobj, &entry); \
if (atom) \
PCMETER(cache_->misses++);

jsinterp.c
js_FullTestPropertyCache
entry = &JS_PROPERTY_CACHE(cx).table[PROPERTY_CACHE_HASH_ATOM(atom, obj, NULL)];

#define PROPERTY_CACHE_HASH_ATOM(atom,obj,pobj) \
PROPERTY_CACHE_HASH((jsuword)(atom) >> 2, OBJ_SCOPE(obj)->shape)

If the property cannot be found, SpiderMonkey does a full property lookup in the property tree. This includes going up through the prototype chain to find the property. After enough local properties have been added, the object property lookup simply becomes a hash lookup. In quick summary, based on the current program counter and scope shape, we look for a given property. If it is not found, we include the object instance the property should be in. Finally, if nothing is found, we do a full property look up.

Now the only question is, how do we determine what to put in the cache, and when? The function that fills the property cache is in jsinterp.c in function js_FillPropertyCache. This function will take a given shape and allocate some space in the property cache associated with the given shape. js_FillPropertyCache is called upon the setting/getting of a property when the property could not be found in the cache, and a full property lookup occurred.

Reducing the number of guards with Shapes

When Andreas, Michael, and I first started doing trace compilation on JavaScript, we had some interesting unexpected issues. What we discovered with our first implementation, was that there were too many guards during property look ups, and too many guards regarding the type of a specific variable/property. With so many guards, the amount of bailout code increased, which led to in general, an explosion of code size, and a performance hit.

We looked into a way of compressing the type and structure of objects into a single check. The structure of an object is what properties an object has and their corresponding types, not their values. This way, we could remove the type check guards for every property access, and instead replace them with a single check which ensures that an object's properties did not change.

The idea Andreas came up with, was to encapsulate this information into a "Shape", which is a global unique id among all objects in the system. So if you have two different objects which contain the same properties, and all the properties have the same type, they would share the same shape id. How does this help with the guards? How does this affect a trace? You still have issues of type conversions during specific operations, for example with an add. If you want to do integer addition, you still need a check to ensure that the integer doesn't overflow into a double as the semantics of JavaScript say a number is always a double.

Now the question is, where do you insert these checks? This was a bit confusing for a very long time, so please feel free to ask questions if its not clear. As always, its easier to explain if we have an example. The code example we'll use is this:.

var i = 0;
var j = new Object;
j.x = 0;
var temp = 0;

for (i = 0; i < 100000000; i++) {
temp += i;
j.x = temp;
}


Sometime during execution, the var temp will overflow into a double. However, assuming the loop threshold to start a trace is low enough, a trace should have been compiled such that j.x is typed as an integer, temp is an integer, and the generated code does an integer addition. Let's assume now that the current "Shape" for variable j is 10. If a new property is added, or the type of a property changes, the "Shape" would increase to 11. One way of looking at a trace is as follows:

: topOfLoop

Guard to check i < 10000000
Guard to ensure j is shape 10
integer add temp + i

Guard to ensure temp + i is an integer
assign temp to temp + i

Guard to ensure temp is an integer
assign j.x to var temp;

jump topOfLoop


In SSA this would not work since j.x would point directly to the addition result, but let's assume a simple compiler just to make a point. The previous trace works just fine as long as temp stays within the bounds of an INTEGER. It also makes logical sense to always guard on the type of the value being assigned to a given property. If we didn't, you could technically have a case where you assigned some property a value which has a different type than the shape indicated. In this case, the shape identifier of "10" would no longer valid if the property j.x is no longer an integer. So what happens when j.x finally overflows into a double?

The trace would jump into the interpreter, in which case the shape of j would change to "11" since j.x is now a double. Now upon recording, we would insert a guard to ensure that the shape of j is 11, the addition becomes a double with no check since there is nothing else to convert a double to, in terms of adding numbers. The resulting trace would look something like this:

: topOfLoop
Guard to ensure j is shape 11
Guard to check i &lt; 10000000

double add temp + i

assign temp to temp + i
assign j.x to var temp;

jump topOfLoop


However, we could greatly optimize the integer case with the following trace structure:
Guard to ensure j is shape 10

: topOfLoop

Guard to check i < 10000000

integer add temp + i
Guard type to ensure temp + i is an integer

assign temp to temp + i
assign j.x to var temp;

jump topOfLoop

We removed the guard to ensure that j.x was being assigned an integer, and pulled the guard to ensure j was shape 10 outside of the loop. Here is the important part. We no longer need the guard to ensure j.x is being assigned an integer. We had one check outside of the loop which checked j.x is still an integer type. Next, since the value being assigned was ALREADY checked for overflow, we don't need another redundant guard when assigning j.x. If j.x is supposed to overflow, it would be detected by the guard to ensure that the addition result was still an integer. The compiled code would never reach the code to assign the value to j.x. We can carry this idea to any assignment for a property. The type check for a property will always be done by a previous guard at the actual operation time, not the property assignment time.

The most interesting portion, is that SpiderMonkey generally has the same idea, in a slightly different fashion. To get an in depth look at whats going on, there is a 2 page comment in the jsscope.h file of SpiderMonkey. Even though there is an explanation of whats going on, it is difficult to understand, and the code is pretty icky. I'll dissect the actual implementation later on. We also want to implement something like this into Tamarin-Tracing.