Have tracing JIT compilers won - notes

There is a really awesome discussion on whether or not trace compilers have "won" at Lambda the Ultimate. It's pretty dense so here's some background information and synopsis to help follow.

All the comments center on how to pair trace compilation with other execution techniques (e.g. method at a time compilation, interpretation). There's a bit of tracing background required to fully understand everything. The basic problem with tracing is that it fails when you have really branchy code. If you try to "stay on a trace" (stay in JIT compiled code instead of switching to a different execution technique), you'll lose overall because you'll be spending time compiling code instead of making forward progress on the actual application. You'll also explode your code cache and there will be a ton of useless traces laying in memory.

Thomas Lord's analysis is spot on. His basic premise is that tracing is great in specific cases. However, a VM needs multiple compilation strategies. Also, it's really difficult to create the "best" code. There are lots of ways to profile the running code and trying to find out some algorithm that can detect what the most optimal code is, but it's hard. I agree with him when he says that it will be very difficult to find the most optimal code. There is a pretty cool paper at CGO 2010 about solving this problem.

Ben Titzer and Andreas Gal discuss traditional compilation vs trace compilation as a whole, and in what situations a trace compiler will be relevant. Brendan Eich backs up the idea that trace compilation is a viable compilation strategy. Android uses trace trees which unlike dynamo, connects multiple traces together at a singe point. The bottom line is when traces work, they work really well. 

That's when Mike Pall, the creator of LuaJIT, chimes in. Generally, LuaJIT is considered to be one of the fastest, if not the fastest, dynamic language VM. The links in that specific post are very interesting and relevant. LuaJIT has a really fast interpreter with a trace compiler on top. The resulting subthread is the most interesting. 

Peter proposes tracing native code. Mike says that instead of tracing native code, you should just make a really fast interpreter and add a tracing JIT on top. Mike thinks that having three execution engines (interpreter, method JIT, and a tracing JIT) is too complicated. Brendan agrees and says all you really need is a generic method JIT and a tracing JIT on top. Also, PICs are polymorphic inline caches and are pretty much used by everyone. At this point, the subthread moves onto having a method + tracing JIT and how can you trace the generic method JIT code. In my opinion, a generic method JIT + tracing JIT on top is the way to go. LuaJIT takes the approach of the fast interpreter + trace compiler route

Another interesting note Brendan brings up is to get a simpler language. Brendan's main argument about LuaJIT is that it traces a lot less code than TraceMonkey does because Lua is simpler. This is where a fundamental issue with tracing comes up. What kind of heuristics do you need to decide when you should trace something versus staying in generic slower code? If you keep jumping out of traces you lose. Brendan seems to be taking the position that you should try to trace more code. 

Mike Pall thinks it's better to have a really fast baseline and trace when you can guarantee a performance improvement. He thinks trace trees are too complicated without any real performance gain. LuaJIT takes a bunch of traces and sticks them together wherever they appear. Trace trees must be linked at the same program location. Mike argues that trace trees only win if you recompile all the traces together to optimize them together. However, in the context of the web, the delay of recompiling everything is too severe and so nobody does it. I really like his idea of cross-trace register coalescing - putting values into registers that are known across traces. This makes switching between traces a lot faster and solves a real fundamental issue UC Irvine had with traces. Trace nesting is having a trace tree as part of another trace tree (inner loops). Brendan seems to agree that sticking traces together whenever they connect is the way to go, but trace trees are useful somtimes. This is also where Brendan comments about the complexity of JavaScript compared to Lua comes in

As a random tangent, Lius Gonzalez talks about PyPy. PyPy tries to be a VM for all languages. It has an interpreter that executes a target application programming language. They then trace the internal PyPy interpreter loop to optimize the application programming language. This is a lot like Scott Peterson's (from Adobe) work. Alexander Yermolovich interned with Scott in 2008 and wrote a paper on it (Optimization of dynamic languages using hierarchical layering of virtual machines). They took the Lua VM (not LuaJIT I think), ran it through Alchemy, and then ran it on top of Tamarin-Tracing. 

Mike Pall responds saying that the layers add up. The most interesting is point #3, where PyPy loses all the high level information, limiting their ability to optimize the application code. I think this is generally true. ABC and LIR in NanoJIT suffer from the same problem. 

Some of the small notes come with tracing through the browser. I think Andreas Gal was talking about this a while ago. Large swaths of Firefox's UI is written in JavaScript so it's actually a very important issue for them. I can't quite follow the whole Membranes discussion. 

Another thought is whether or not it's possible to write a metacircular tracing JIT. Michael Bebenita is doing it with Maxine. He's hitting the same problem as everyone else which is what you should trace instead of doing another compilation technique. The basic premise is that when you have a method JIT, you really have to restrict your traces because you're going to win a lot less often than you would if you only have an interpreter. In fact if you trace too much, you're going to lose really fast. 

Finally, a few random notes that date back to Tamarin-Tracing. When I first interned at Adobe, Edwin Smith actually tasked me to trace native code. At the time I didn't know it, but I implemented a call threaded interpreter. What we found out was that switching from one machine code frame to another is really expensive. I'm sure we could've solved the problem, but Tamarin-Tracing was already canceled. At the moment, my mind is blown realizing that a lot of the problems with Tamarin-Tracing are coming back up on this thread. 

The bottom line is that everyone agrees you need some other kind of execution mechanism and a trace compiler on top. The unsettled questions are:

1) Should you pair a trace compiler with a really fast interpreter or a generic method at a time JIT. 
2) What and how much should you trace. 


Whew, I hope that helped. There are lots of interesting tangents, but I tried to focus on the tracing aspects of the post. Feel free to ask more questions if something is unclear. 


* I'll post updates as the thread progresses.