Essays
Friday
12Mar2010

Have tracing JIT compilers won - notes

There is a really awesome discussion on whether or not trace compilers have "won" at Lambda the Ultimate. It's pretty dense so here's some background information and synopsis to help follow.

All the comments center on how to pair trace compilation with other execution techniques (e.g. method at a time compilation, interpretation). There's a bit of tracing background required to fully understand everything. The basic problem with tracing is that it fails when you have really branchy code. If you try to "stay on a trace" (stay in JIT compiled code instead of switching to a different execution technique), you'll lose overall because you'll be spending time compiling code instead of making forward progress on the actual application. You'll also explode your code cache and there will be a ton of useless traces laying in memory.

Thomas Lord's analysis is spot on. His basic premise is that tracing is great in specific cases. However, a VM needs multiple compilation strategies. Also, it's really difficult to create the "best" code. There are lots of ways to profile the running code and trying to find out some algorithm that can detect what the most optimal code is, but it's hard. I agree with him when he says that it will be very difficult to find the most optimal code. There is a pretty cool paper at CGO 2010 about solving this problem.

Ben Titzer and Andreas Gal discuss traditional compilation vs trace compilation as a whole, and in what situations a trace compiler will be relevant. Brendan Eich backs up the idea that trace compilation is a viable compilation strategy. Android uses trace trees which unlike dynamo, connects multiple traces together at a singe point. The bottom line is when traces work, they work really well. 

That's when Mike Pall, the creator of LuaJIT, chimes in. Generally, LuaJIT is considered to be one of the fastest, if not the fastest, dynamic language VM. The links in that specific post are very interesting and relevant. LuaJIT has a really fast interpreter with a trace compiler on top. The resulting subthread is the most interesting. 

Peter proposes tracing native code. Mike says that instead of tracing native code, you should just make a really fast interpreter and add a tracing JIT on top. Mike thinks that having three execution engines (interpreter, method JIT, and a tracing JIT) is too complicated. Brendan agrees and says all you really need is a generic method JIT and a tracing JIT on top. Also, PICs are polymorphic inline caches and are pretty much used by everyone. At this point, the subthread moves onto having a method + tracing JIT and how can you trace the generic method JIT code. In my opinion, a generic method JIT + tracing JIT on top is the way to go. LuaJIT takes the approach of the fast interpreter + trace compiler route

Another interesting note Brendan brings up is to get a simpler language. Brendan's main argument about LuaJIT is that it traces a lot less code than TraceMonkey does because Lua is simpler. This is where a fundamental issue with tracing comes up. What kind of heuristics do you need to decide when you should trace something versus staying in generic slower code? If you keep jumping out of traces you lose. Brendan seems to be taking the position that you should try to trace more code. 

Mike Pall thinks it's better to have a really fast baseline and trace when you can guarantee a performance improvement. He thinks trace trees are too complicated without any real performance gain. LuaJIT takes a bunch of traces and sticks them together wherever they appear. Trace trees must be linked at the same program location. Mike argues that trace trees only win if you recompile all the traces together to optimize them together. However, in the context of the web, the delay of recompiling everything is too severe and so nobody does it. I really like his idea of cross-trace register coalescing - putting values into registers that are known across traces. This makes switching between traces a lot faster and solves a real fundamental issue UC Irvine had with traces. Trace nesting is having a trace tree as part of another trace tree (inner loops). Brendan seems to agree that sticking traces together whenever they connect is the way to go, but trace trees are useful somtimes. This is also where Brendan comments about the complexity of JavaScript compared to Lua comes in

As a random tangent, Lius Gonzalez talks about PyPy. PyPy tries to be a VM for all languages. It has an interpreter that executes a target application programming language. They then trace the internal PyPy interpreter loop to optimize the application programming language. This is a lot like Scott Peterson's (from Adobe) work. Alexander Yermolovich interned with Scott in 2008 and wrote a paper on it (Optimization of dynamic languages using hierarchical layering of virtual machines). They took the Lua VM (not LuaJIT I think), ran it through Alchemy, and then ran it on top of Tamarin-Tracing. 

Mike Pall responds saying that the layers add up. The most interesting is point #3, where PyPy loses all the high level information, limiting their ability to optimize the application code. I think this is generally true. ABC and LIR in NanoJIT suffer from the same problem. 

Some of the small notes come with tracing through the browser. I think Andreas Gal was talking about this a while ago. Large swaths of Firefox's UI is written in JavaScript so it's actually a very important issue for them. I can't quite follow the whole Membranes discussion. 

Another thought is whether or not it's possible to write a metacircular tracing JIT. Michael Bebenita is doing it with Maxine. He's hitting the same problem as everyone else which is what you should trace instead of doing another compilation technique. The basic premise is that when you have a method JIT, you really have to restrict your traces because you're going to win a lot less often than you would if you only have an interpreter. In fact if you trace too much, you're going to lose really fast. 

Finally, a few random notes that date back to Tamarin-Tracing. When I first interned at Adobe, Edwin Smith actually tasked me to trace native code. At the time I didn't know it, but I implemented a call threaded interpreter. What we found out was that switching from one machine code frame to another is really expensive. I'm sure we could've solved the problem, but Tamarin-Tracing was already canceled. At the moment, my mind is blown realizing that a lot of the problems with Tamarin-Tracing are coming back up on this thread. 

The bottom line is that everyone agrees you need some other kind of execution mechanism and a trace compiler on top. The unsettled questions are:

1) Should you pair a trace compiler with a really fast interpreter or a generic method at a time JIT. 
2) What and how much should you trace. 

 

Whew, I hope that helped. There are lots of interesting tangents, but I tried to focus on the tracing aspects of the post. Feel free to ask more questions if something is unclear. 

 

Mason
* I'll post updates as the thread progresses.
Thursday
04Feb2010

Where is Flash going?

When you roam the online forums and look at the reaction of the iPad not having Flash, you can see two distinct camps. People are either angry because they can't use Flash or happy to see Flash not installed. Now with the iPad and the iPhone dominating the mobile market, more than ever, people are predicting that Flash will die.

In some cases, the concerns are absolutely correct. The example everyone is using is video - and I personally agree that Flash should not be the dominant video player. Video should be an open standard because it's so integral to the web. It is in the HTML5 spec so that any browser implementor can play video. You're going to want to use HTML5 for video, for if not ALL video, at least most. Same with a few animations. Flash should die as the de facto video player. But the death of one application does not mean the death of a whole platform.

If you look at Flash as a ubiquitous video player only, it really is all downhill. The real problem is that the platform almost hit 100% ubiquity. What's so sad is that once you hit 100%, you can only go down and that's what's overlooked. So yes, Flash will probably decline, but death? If Flash remains on 80% of all browsers, that's still a very impressive and quite alive platform. What Flash really needs to do then, is move away from playing video and displaying annoying ads, to becoming a platform to build applications.

This brings up the question of whether or not there is room for Rich Internet Applications and extra plugins in general (Flash, SilverLight, Java FX, and Google Native Client). I don't pretend to be able to accurately predict the future, but history tells us developers are going to want to do something that HTML/AJAX/CSS can't do. That's why plugins were invented in the first place. All Adobe can do is make something awesome and try to court developers. So who will want to use Flash to build applications?

Most web developers don't seem to care. One argument for Flash is that it streamlines the whole web application development workflow from start to finish. You don't need to mess with HTML, JavaScript, CSS, PHP, SQL, etc. However, this has been the case for a while and you still don't see that many developers jumping onto Flash, SilverLight, or Java FX. The pain of so many web technologies, of dealing with all the browser incompatabilities, isn't painful enough for people to jump aboard any of these plugin systems. The quality of developer tools don't seem compelling enough either. SilverLight's whole ecosystem is lightyears ahead of Flash's, yet few people jump to SilverLight. As a whole, web developers just don't want to deal with plugins unless they absolutely have to and only then they use Flash because it's installed everywhere. While the use of plugins in general will decline, the first choice will still be Flash.

What about the big mobile web growth? There are mobile versions of web applications and there are native apps. There is a clear want for native applications on phones. The big elephant is the iPhone and it's descendants (iPod touch, iPad). While it doesn't support Flash, there is a hack around in Creative Suite 5 to have Flash applications on the iPhone. Every other phone will have native Flash. Near ubiquity on mobile platforms is a compelling case. Unlike the web where you have three main rendering engines (IE, Gecko, WebKit) that are all trying to implement some standard (IE 6 doesn't count), all the cell phones are completely different. Writing an application for a BlackBerry is completely different from an Android device. The "write once, run anywhere" may be worthwhile on mobile and that's also where growth is. I can see many developers saying it's nice that I only have to write three types of applications instead of a billion: A normal web app with PHP, HTML, etc, an iPhone app, and an everyone else application in Flash. At least it stops at three.

Lastly, the market Adobe targets are the designers and content creators. Adobe's products let content creators focus on making content, not coding. Designers should not have to know anything at all about code period. Only the more advanced users, who want to do something that can't be done in the regular toolset should have to look at code. Then they should have the ability to tinker with it. As long as Adobe focuses on letting content creators create, Adobe wins. Adobe makes money off tools with no clear competitors. There is nothing out there that says Adobe can't have two buttons on all the designer tools: publish to HTML5 if thats all they need. If they want to do something HTML doesn't support, have a "Publish to Flash" button. I actually see this being an amazing selling point.

When you look at Flash as a pure video player and if the world revolves around the iPhone, it does look like Flash is starting the slow spiral to irrelevance. Realistically Flash is probably not going to remain the dominant video player, but it is going to be on everything but the iPhone - quite a big difference than death. The real question then, is what new areas will Flash be used in, and will it be awesome enough to attract anyone? As a partial Flash VM developer, I find it quite interesting. It frees me to start looking at markets that Flash may never have been used in before. Server Flash? Can it compete with PHP or ASP.NET? Can I install it on my TV and play some games? If Flash isn't constrained to annoying ads and a video player, where could it go? How does Flash give content creators awesome ways to display their great work in a fluid manner? That's a much more interesting question than asking "how can we save Flash?".

Other Opinions:

Random Notes:

  • Tamarin is the virtual machine inside Flash. The VM is all open source.
  • Most of the Tamarin team develops on the mac. I'm the weird one who uses Windows.

Disclaimer: I am a part time intern in Adobe's research lab (not product! I have no idea what product is doing.) working on the Tamarin VM. I'm also a student which lets me do things that may not be practical from a business perspective. All thoughts expressed are mine and mine alone (I'm sure there is some bias here). They do not represent Adobe or Adobe's position on anything. I don't have any insider knowledge nor influence as to what Flash is going to be doing. And no I was not asked nor paid to write this post by anyone at Adobe.

Monday
01Feb2010

The Real Value of an Internship

New Essay up on what an internship is really for.

Most people assume that the most valuable thing you get out of an internship is a full time job once you graduate. While I'd be silly not to assume that a full time job is extremely valuable, especially with 10% unemployment, there is a second more valuable aspect: The ability to explore.

Check it out here.

Monday
21Dec2009

Coders at Work

Coders At Work, by Peter Seibel, interviews 15 famous computer scientists about both the technical and non-technical issues in computer science.

Seibel starts off by asking every person how they learned to program. The person answers, and Siebel keeps digging for more wherever it goes, creating a conversational tone that makes the book an easy read with immense depth. Whenever a pause in the conversation is hit, Siebel asks another standard question. Rinse and repeat for 4 or 5 main questions. Siebel is quite a skilled interviewer. He goes in depth when he needs to and gives each question enough time to cover the major issues. The result is that you get to see how 15 respected people have totally different views on the same subject.

For example, how is computer science as a field, going to solve parallelism? We have so many cores and no idea what to do with them. Many say functional languages are the way to go as they give you this nice ability to go do some piece of work given an independent set of data. Others say transactional memory is just one of many ways to solve the concurrent problem.

The book also gives insight as to how differently people approach programming. Some like the "bottom-up" approach: you just start writing code, hashing things out as they go. Some, notably Donald Knuth, like to think about the program for months on end before even touching a computer. Some programmers need nothing more than a text editor and print statements to debug any program. Amazingly, one of the interviewees who is known for his debugging abilities, simply rewrites everything and magically bugs go away.

Since the programmers were good, many eventually moved into management. Siebel asked about managing computer scientists starting with one of the biggest problems: "how do you find great talent?". Some of the interviewees went with a very unsatisfying answer - it's a gut feeling. They have to talk with the person for an hour. They have to bounce ideas off someone's head, see how their head works. Or they just ask them about their old projects and see what problems they encountered and how they solved them. Thankfully, many of the interviewees didn't like the logic puzzle approach to finding people. (I bomb magnificently at such puzzles!)

Others such as the director of Yahoo pointed out that he looks at their writing skills. If they have great English, they probably have great code. A few other people said the same thing. Then it hit me that it could entirely be true. Both writing and coding are fundamentally about translating an idea into something understandable. English for humans; code for computers. Both use the same thought processes. An essay needs structure, needs to be edited, and rewritten numerous times, which sounds a lot like refactoring.

The great thing is that coders at work is filled with many more "ah ha!" moments. These 15 genius' just keep throwing insight after insight in such explicit glory that I found myself feeling like a better programmer just by reading the book. If you're a programmer that cares about the craft at all, you need this book.

Thursday
10Dec2009

A C++ Name Demangler

C++ compilers mangle function names, or rewrite them into something that is unique across the whole program. However, the new names are usually unreadable. You need to demangle them back into a human readable format.

I needed to demangle some C++ method names for debugging purposes and couldn't find an easy to use/build/embed demangler library. Instead, I stripped out parts of the C++ name demangler from GNU binutils and embeded that.

If you need such functionality, you can download the bare minimum stripped C++ name demangler here.

* Provided without warranty, under the GPL license.