Re: To which extent LuaJIT is specific to Lua

  • From: Mike Pall <mike-1311@xxxxxxxxxx>
  • To: luajit@xxxxxxxxxxxxx
  • Date: Fri, 29 Nov 2013 19:14:49 +0100

Leo Romanoff wrote:
> Is it possible to use (a correspondingly extended) LuaJIT as a generic 
> tracing JIT for other languages with different semantics and execution 
> models? Or is LuaJIT so tightly coupled to Lua, its semantics and execution 
> model that it is almost impossible to reuse it for something significantly 
> different from Lua without rewriting it almost completely?

You can certainly adapt the overall design and take many parts of
it. Depends a bit on how close the language is to Lua.

> I understand that LuaJIT was initially created specifically for Lua. But I'm 
> wondering to which extent LuaJITs inner organization, design and 
> implementation are tied to the semantics of Lua. I can imagine that some 
> parts of LuaJIT (e.g. machine code generation, register allocation, some 
> generic optimizations) are not that much dependent on Lua and its semantics, 
> while some other parts (e.g. some of optimizations) are rather tightly 
> coupled to Lua, because they are only possible if Lua semantics is assumed 
> (e.g. table indexing starts with 1, strings are interned, etc).

Two key design decision that make it difficult to reuse all of the
code is the medium-level IR and the stack snapshots. These reflect
both the semantics of Lua (e.g. hash tables) and its execution
model.

OTOH the more recent work, e.g. for the FFI, builds upon lower-level
parts of the IR. It would be entirely possible to generate IR for
a C-like language right now, but it would be harder to deal with
the differences in the execution model. Interestingly, this is the
opposite of what you'd be facing when trying to retarget (say)
LLVM to a dynamic language, because it makes various assumptions
about the execution model that are geared towards compiling C/C++.

> So, I'm wondering which parts of LuaJIT are generic and which ones are 
> tightly based on Lua semantics? The reason for this question: I'd like to 
> understand better if LuaJIT could be used as a tracing JIT backend for 
> something very different from Lua in its semantics. I understand that it is 
> most likely not possible out of the box and would require adaptations and 
> extensions of LuaJIT. But the question is - how much work would it be? Would 
> one need to rewrite almost all of LuaJIT or may be there are only few places 
> in the code that are very much dependent on the language semantics and 
> therefore need to be adjusted/changed to meet the semantics of a different 
> language? And I'm really in using LuaJIT directly, without mapping the 
> original source language to Lua first, even though it could be possible in 
> some cases.

I'm not sure I could quantify this. Maybe ask Thomas Schilling. He
wrote a trace compiler for Haskell whose code is based on the core
parts of the LuaJIT interpreter and trace compiler. Heavily
modified, of course:

http://cp.reddit.com/r/haskell/comments/1r4s7b/what_happened_to_the_tracing_jit_work_by_thomas/

> Some examples of the languages and features I have in mind are:
> - some sort of statically typed language like C/Pascal/etc

Entirely possible.

> - something like JVM-based languages, e.g. Java, Scala? You said yourself 
> that LuaJIT beats JVM in many cases.

The core challenge is proper implementation of the execution model
wrt. concurrency or the GC. And some specific optimizations for
allocations, since Java programs tend to allocate temporaries like
there's no tomorrow (collateral damage from the language and
library design).

> - some scripting languages, which have a different internal object/execution 
> model, e.g. Python or Ruby

Sure, Python has PyPy. The difficulties you'll be facing there
have more to do with legacy C interfaces, the abundance of
specialized core data types or wasteful execution semantics. One
pays for abstractions, one way or another.

> - features like: custom object layouts in memory (e.g. C-like structs vs 
> Lua's tables), custom garbage collectors, support for custom ABIs.

Code reuse should be easy if you base it on the same FFI design.

Custom garbage collectors are troublesome for any VM or compiler.
Sounds nice on paper and certainly interesting for research. But
IMHO only a fully integrated GC offers top performance.

--Mike

Other related posts: