The Art of SquirrelFish's Bytecode Generation

I have to say, after my summer at Adobe, my C skills have definitely improved. Thanks Edwin Smith, Tom Reilly, and Dan Smith! Going through SquirrelFish seems way easier than going through Tamarin. On that note, let's take a look at how SquirrelFish generates bytecodes. SquirrelFish does something pretty nifty. It parses the JavaScript source code into an Abstract Syntax Tree (AST). This AST is passed into a byte code generator that walks the tree, generating bytecode and assigning registers to specific values as it goes along. For some reason, I expected bytecode to be generated during the initial parsing. SquirrelFish heavily uses polymorphism so that each AST node generates the bytecode that itself represents.


This is done via a main class Node with a virtual function emitBytecode. Various other classes derive from Node such as NumberNode, or ExpressionNode. Each of these classes implement emitBytecode, recursively going through the AST to generate the bytecode. For example, the bytecode generated in NumberNode is a load instruction:
Nodes.cpp
RegisterID* NumberNode::emitBytecode(BytecodeGenerator& generator, RegisterID* dst)
{
if (dst == generator.ignoredResult())
return 0;
return generator.emitLoad(dst, m_double);
}
You can see all the other implementations in the file JavaScriptCore/parser/Nodes.cpp. There are a two other classes that are important. The ParserRefCounted is for garbage collection purposes and ThrowableExpressionData allows a node to throw an exception upon an error in compiled code, not a syntax error. Since a Node has some memory allocated to it, it derives from ParserRefCounted. All other parse tree classes subclass from these three classes. Some classes such as RegExpNode inherit from both Node and ThrowableExpressionData since it generates its own bytecode and can throw an error. Below is the main class hierarchy for the SquirrelFish parser. There are a ton more classes that aren't included because the hierarchy is huge, but the gist is in the image.



If you want to see the bytecode of any JavaScript source, run a debug build of jsc.exe with the -d flag. For a JavaScript source print("hello world"), you get the following:
6 m_instructions; 72 bytes at 00BAD510; 1 parameter(s); 14 callee register(s)

[ 0] enter
[ 1] mov r3, r0
[ 4] resolve_func r4, r3, print(@id0)
[ 8] mov r5, r1
[ 11] call r3, r3, 2, 14
[ 16] end r3

Identifiers:
id0 = print

Constants:
r0 = undefined
r1 = "hello world"

hello world
End: undefined

All of the bytecode definitions can be found on the Webkit website. Next up, register allocation!