There is a really awesome discussion on whether or not trace compilers have "won" at Lambda the Ultimate. It's pretty dense so here's some background information and synopsis to help follow.
All the comments center on how to pair trace compilation with other execution techniques (e.g. method at a time compilation, interpretation). There's a bit of tracing background required to fully understand everything. The basic problem with tracing is that it fails when you have really branchy code. If you try to "stay on a trace" (stay in JIT compiled code instead of switching to a different execution technique), you'll lose overall because you'll be spending time compiling code instead of making forward progress on the actual application. You'll also explode your code cache and there will be a ton of useless traces laying in memory.
Thomas Lord's analysis is spot on. His basic premise is that tracing is great in specific cases. However, a VM needs multiple compilation strategies. Also, it's really difficult to create the "best" code. There are lots of ways to profile the running code and trying to find out some algorithm that can detect what the most optimal code is, but it's hard. I agree with him when he says that it will be very difficult to find the most optimal code. There is a pretty cool paper at CGO 2010 about solving this problem.
Ben Titzer and Andreas Gal discuss traditional compilation vs trace compilation as a whole, and in what situations a trace compiler will be relevant. Brendan Eich backs up the idea that trace compilation is a viable compilation strategy. Android uses trace trees which unlike dynamo, connects multiple traces together at a singe point. The bottom line is when traces work, they work really well.
That's when Mike Pall, the creator of LuaJIT, chimes in. Generally, LuaJIT is considered to be one of the fastest, if not the fastest, dynamic language VM. The links in that specific post are very interesting and relevant. LuaJIT has a really fast interpreter with a trace compiler on top. The resulting subthread is the most interesting.
Peter proposes tracing native code. Mike says that instead of tracing native code, you should just make a really fast interpreter and add a tracing JIT on top. Mike thinks that having three execution engines (interpreter, method JIT, and a tracing JIT) is too complicated. Brendan agrees and says all you really need is a generic method JIT and a tracing JIT on top. Also, PICs are polymorphic inline caches and are pretty much used by everyone. At this point, the subthread moves onto having a method + tracing JIT and how can you trace the generic method JIT code. In my opinion, a generic method JIT + tracing JIT on top is the way to go. LuaJIT takes the approach of the fast interpreter + trace compiler route.
Another interesting note Brendan brings up is to get a simpler language. Brendan's main argument about LuaJIT is that it traces a lot less code than TraceMonkey does because Lua is simpler. This is where a fundamental issue with tracing comes up. What kind of heuristics do you need to decide when you should trace something versus staying in generic slower code? If you keep jumping out of traces you lose. Brendan seems to be taking the position that you should try to trace more code.
As a random tangent, Lius Gonzalez talks about PyPy. PyPy tries to be a VM for all languages. It has an interpreter that executes a target application programming language. They then trace the internal PyPy interpreter loop to optimize the application programming language. This is a lot like Scott Peterson's (from Adobe) work. Alexander Yermolovich interned with Scott in 2008 and wrote a paper on it (Optimization of dynamic languages using hierarchical layering of virtual machines). They took the Lua VM (not LuaJIT I think), ran it through Alchemy, and then ran it on top of Tamarin-Tracing.
Mike Pall responds saying that the layers add up. The most interesting is point #3, where PyPy loses all the high level information, limiting their ability to optimize the application code. I think this is generally true. ABC and LIR in NanoJIT suffer from the same problem.
Another thought is whether or not it's possible to write a metacircular tracing JIT. Michael Bebenita is doing it with Maxine. He's hitting the same problem as everyone else which is what you should trace instead of doing another compilation technique. The basic premise is that when you have a method JIT, you really have to restrict your traces because you're going to win a lot less often than you would if you only have an interpreter. In fact if you trace too much, you're going to lose really fast.
Finally, a few random notes that date back to Tamarin-Tracing. When I first interned at Adobe, Edwin Smith actually tasked me to trace native code. At the time I didn't know it, but I implemented a call threaded interpreter. What we found out was that switching from one machine code frame to another is really expensive. I'm sure we could've solved the problem, but Tamarin-Tracing was already canceled. At the moment, my mind is blown realizing that a lot of the problems with Tamarin-Tracing are coming back up on this thread.
The bottom line is that everyone agrees you need some other kind of execution mechanism and a trace compiler on top. The unsettled questions are:
Whew, I hope that helped. There are lots of interesting tangents, but I tried to focus on the tracing aspects of the post. Feel free to ask more questions if something is unclear.