Essays
« Cooking College Student Style | Main | Better Know a Virtual Machine - Bytecodes and Dispatch »
Saturday
10Jan2009

The Art of SquirrelFish's Bytecode Generation

I have to say, after my summer at Adobe, my C skills have definitely improved. Thanks Edwin Smith, Tom Reilly, and Dan Smith! Going through SquirrelFish seems way easier than going through Tamarin. On that note, let's take a look at how SquirrelFish generates bytecodes. SquirrelFish does something pretty nifty. It parses the JavaScript source code into an Abstract Syntax Tree (AST). This AST is passed into a byte code generator that walks the tree, generating bytecode and assigning registers to specific values as it goes along. For some reason, I expected bytecode to be generated during the initial parsing. SquirrelFish heavily uses polymorphism so that each AST node generates the bytecode that itself represents.


This is done via a main class Node with a virtual function emitBytecode. Various other classes derive from Node such as NumberNode, or ExpressionNode. Each of these classes implement emitBytecode, recursively going through the AST to generate the bytecode. For example, the bytecode generated in NumberNode is a load instruction:
Nodes.cpp
RegisterID* NumberNode::emitBytecode(BytecodeGenerator& generator, RegisterID* dst)
{
if (dst == generator.ignoredResult())
return 0;
return generator.emitLoad(dst, m_double);
}
You can see all the other implementations in the file JavaScriptCore/parser/Nodes.cpp. There are a two other classes that are important. The ParserRefCounted is for garbage collection purposes and ThrowableExpressionData allows a node to throw an exception upon an error in compiled code, not a syntax error. Since a Node has some memory allocated to it, it derives from ParserRefCounted. All other parse tree classes subclass from these three classes. Some classes such as RegExpNode inherit from both Node and ThrowableExpressionData since it generates its own bytecode and can throw an error. Below is the main class hierarchy for the SquirrelFish parser. There are a ton more classes that aren't included because the hierarchy is huge, but the gist is in the image.



If you want to see the bytecode of any JavaScript source, run a debug build of jsc.exe with the -d flag. For a JavaScript source print("hello world"), you get the following:
6 m_instructions; 72 bytes at 00BAD510; 1 parameter(s); 14 callee register(s)

[ 0] enter
[ 1] mov r3, r0
[ 4] resolve_func r4, r3, print(@id0)
[ 8] mov r5, r1
[ 11] call r3, r3, 2, 14
[ 16] end r3

Identifiers:
id0 = print

Constants:
r0 = undefined
r1 = "hello world"

hello world
End: undefined

All of the bytecode definitions can be found on the Webkit website. Next up, register allocation!

PrintView Printer Friendly Version

EmailEmail Article to Friend

Reader Comments (3)

nice one. Very informative.

January 10, 2009 | Unregistered CommenterMandar Deodhar

Very useful. Much appreciated.

March 24, 2009 | Unregistered Commenterkevin c

Very useful information. I was looking for in-depth posts about squirrelfish like yours.
May I know how I can see my bytecodes which the squirrelfish generates based on my java script code?
Could you give me a direction?
I succeeded to make a sample code and compile it with the squirrelfish.

July 17, 2009 | Unregistered CommenterAlways19

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>