Re: To which extent LuaJIT is specific to Lua

From: Leo Romanoff <romixlev@xxxxxxxxx>
To: "luajit@xxxxxxxxxxxxx" <luajit@xxxxxxxxxxxxx>
Date: Fri, 29 Nov 2013 11:27:26 -0800 (PST)
Mike, first of all, thanks a lot for your very insightful answers!

> Mike Pall <mike-1311@xxxxxxxxxx> schrieb am 19:15 Freitag, 29.November 2013:

> > Leo Romanoff wrote:
>>  Is it possible to use (a correspondingly extended) LuaJIT as a generic 
> tracing JIT for other languages with different semantics and execution 
> models? 
> Or is LuaJIT so tightly coupled to Lua, its semantics and execution model 
> that 
> it is almost impossible to reuse it for something significantly different 
> from 
> Lua without rewriting it almost completely?
> 
> You can certainly adapt the overall design and take many parts of
> it. Depends a bit on how close the language is to Lua.

Yes. I suspected this ;-)

>>  I understand that LuaJIT was initially created specifically for Lua. But 
> I'm wondering to which extent LuaJITs inner organization, design and 
> implementation are tied to the semantics of Lua. I can imagine that some 
> parts 
> of LuaJIT (e.g. machine code generation, register allocation, some generic 
> optimizations) are not that much dependent on Lua and its semantics, while 
> some 
> other parts (e.g. some of optimizations) are rather tightly coupled to Lua, 
> because they are only possible if Lua semantics is assumed (e.g. table 
> indexing 
> starts with 1, strings are interned, etc).
> 
> Two key design decision that make it difficult to reuse all of the
> code is the medium-level IR and the stack snapshots. These reflect
> both the semantics of Lua (e.g. hash tables) and its execution
> model.

OK. I'm wondering how much your tracing infrastructure implementation is 
dependent on Lua semantics and its execution model? I.e. could the tracing 
implementation mostly be reused (e.g. taking traces, detecting what and where 
should be traced, storing traces, purging traces, etc)? Obviously certain parts 
of this are almost always language/runtime/execution model dependent, but I 
think that a lot of things are also pretty generic, or?

> OTOH the more recent work, e.g. for the FFI, builds upon lower-level
> parts of the IR. It would be entirely possible to generate IR for
> a C-like language right now, but it would be harder to deal with
> the differences in the execution model.

Could you elaborate a bit more on these differences and related difficulties?

> Interestingly, this is the
> opposite of what you'd be facing when trying to retarget (say)
> LLVM to a dynamic language, because it makes various assumptions
> about the execution model that are geared towards compiling C/C++.

Indeed. I see your point.
BTW, speaking of LLVM I'd like to mention that it is pretty easy to experiment 
with it, extend it, etc.
It is very modular and most of important concerns are clearly separated and 
abstracted. I'm not saying that it produces better results than LuaJIT. I'm 
just saying that it was developed with extensibility, modularity and 
configurability in mind, which is a good thing when it comes to developing new 
features on top of it.

But see my related questions at the end of this mail.

>>  So, I'm wondering which parts of LuaJIT are generic and which ones are 
> tightly based on Lua semantics? The reason for this question: I'd like to 
> understand better if LuaJIT could be used as a tracing JIT backend for 
> something 
> very different from Lua in its semantics. I understand that it is most likely 
> not possible out of the box and would require adaptations and extensions of 
> LuaJIT. But the question is - how much work would it be? Would one need to 
> rewrite almost all of LuaJIT or may be there are only few places in the code 
> that are very much dependent on the language semantics and therefore need to 
> be 
> adjusted/changed to meet the semantics of a different language? And I'm 
> really in using LuaJIT directly, without mapping the original source language 
> to 
> Lua first, even though it could be possible in some cases.
> 
> I'm not sure I could quantify this.

While LOCs are not a very good measure of complexity, but how big is LuaJIT 
now? And what would be your estimate (either in number of LOCs or percentage of 
code to be changed/added to the core LuaJIT) for retargeting it for a new 
language/execution model, e.g. for some examples mentioned in my original 
message?

I totally understand that counting LOCs in such a complex piece of software 
like a compiler or JIT is not a very good approximation, as every line has more 
complexity than 100s of lines in an average application. I know it very well, 
because I've developed optimizing C compilers in my former life. But due to 
this experience I can roughly translate "compiler LOCs" into the required 
effort. This is why I ask about it. Actually, my former experience is also the 
reason why I started this thread. As an expert in the compiler development 
area, I have a very deep respect for your work on LuaJIT. IMHO, it is really a 
pity that this wonderful technology is currently used only for Lua. I think 
with some adaptations/changes it may be applied in a much broader area with a 
great success.


> Maybe ask Thomas Schilling. He
> wrote a trace compiler for Haskell whose code is based on the core
> parts of the LuaJIT interpreter and trace compiler. Heavily
> modified, of course:
> 
> http://cp.reddit.com/r/haskell/comments/1r4s7b/what_happened_to_the_tracing_jit_work_by_thomas/

Interesting. I've never heard about this attempt. I'll have a look at it. 

>>  Some examples of the languages and features I have in mind are:
>>  - some sort of statically typed language like C/Pascal/etc
> 
> Entirely possible.

OK. Does it mean that it is possible to implement e.g. the usual stack frames 
a-la C/Pascal for keeping the local variables there? It is possible to 
implement "address of" operator and pass variable's address to other functions? 
Is it possible to implement most of C low-level tricks (bit operations, 
unions/structs, type casts, etc)? So, basically it should be possible to build 
a full tracing JIT for C using LuaJIT?

>>  - something like JVM-based languages, e.g. Java, Scala? You said yourself 
> that LuaJIT beats JVM in many cases.
> 
> The core challenge is proper implementation of the execution model
> wrt. concurrency or the GC. And some specific optimizations for
> allocations, since Java programs tend to allocate temporaries like
> there's no tomorrow (collateral damage from the language and
> library design).

Interesting. Do you mean that a not-so-efficient implementation is possible 
with a moderate effort, but an efficient one with a good GC would really be a 
challenge? 

>>  - some scripting languages, which have a different internal 
> object/execution model, e.g. Python or Ruby
> 
> Sure, Python has PyPy. 

Sure. I understand that there is a JIT for Python, so it is possible to write a 
(tracing) JIT for it. 
But my question is: Do you say that LuaJIT could be tweaked with a reasonable 
effort to replace PyPy by providing a full tracing JIT for Python? 

> The difficulties you'll be facing there
> have more to do with legacy C interfaces, the abundance of
> specialized core data types or wasteful execution semantics. One
> pays for abstractions, one way or another.

I see. But I'm wondering if there is anything in JVM or e.g. Python or Ruby 
which cannot be easily mapped to or expressed with LuaJIT? I.e. do know any 
examples where certain core data types, data structures or may be certain 
language constructs/features/low-level details cannot be principally expressed 
using LuaJIT and should be modeled only rather inefficiently, using some 
workarounds.

>>  - features like: custom object layouts in memory (e.g. C-like structs vs 
> Lua's tables), custom garbage collectors, support for custom ABIs.
> 
> Code reuse should be easy if you base it on the same FFI design.
> 
> Custom garbage collectors are troublesome for any VM or compiler.
> Sounds nice on paper and certainly interesting for research. But
> IMHO only a fully integrated GC offers top performance.

Yes. I understand that GC is very tightly coupled with the language and its 
runtime.

Based on your responses, I'd also like to ask the following questions:

- How modular is LuaJIT when it comes to retargeting it for a different 
language or runtime? How configurable is it? For example, are Lua-specific 
optimizations and other implementation details provided in a few well-defined 
places or are they spread all over the place? 

- Are most of the things which heavily depend on the language semantics 
implemented in mostly orthogonal ways or are they very deeply inter-dependent? 
How easy/difficult is it to replace implementation of one of such features with 
an alternative implementation? E.g. if one would like to add a new core data 
type (e.g. C-like array or complex numbers, or a different implementation of 
tables/dicts, which is not compliant with Lua's tables)?

- I'm wondering if you ever considered making LuaJIT more generic and more 
modular (in this sense, a bit like LLVM), so that it can serve as a basis for 
tracing JITs for different languages and runtimes? I understand that it was not 
the initial goal, but still... I'm not even suggesting that you should do it 
yourself, if there is no interest from your side. But would you be in favor of 
such a LuaJIT development direction? Could you roughly estimate the effort 
(e.g. amount of redesign and/or refactoring) required to make LuaJIT an easier 
and more approachable target for other languages? Would you consider developing 
future LuaJIT changes taking these multi-language support considerations into 
account?

Thanks,
  -Leo
Follow-Ups:
- Re: To which extent LuaJIT is specific to Lua
  - From: Ryan
- Re: To which extent LuaJIT is specific to Lua
  - From: Mike Pall
References:
- To which extent LuaJIT is specific to Lua
  - From: Leo Romanoff
- Re: To which extent LuaJIT is specific to Lua
  - From: Mike Pall
Re: To which extent LuaJIT is specific to Lua

Other related posts: