Re: To which extent LuaJIT is specific to Lua

  • From: Mike Pall <mike-1311@xxxxxxxxxx>
  • To: luajit@xxxxxxxxxxxxx
  • Date: Fri, 29 Nov 2013 22:43:34 +0100

Leo Romanoff wrote:
> OK. I'm wondering how much your tracing infrastructure implementation is 
> dependent on Lua semantics and its execution model? I.e. could the tracing 
> implementation mostly be reused (e.g. taking traces, detecting what and where 
> should be traced, storing traces, purging traces, etc)? Obviously certain 
> parts of this are almost always language/runtime/execution model dependent, 
> but I think that a lot of things are also pretty generic, or?

You may have to adapt it to the predominant control structures of
the source language and re-tune the heuristics. But, yes, most of
it is quite generic.

> > OTOH the more recent work, e.g. for the FFI, builds upon lower-level
> > parts of the IR. It would be entirely possible to generate IR for
> > a C-like language right now, but it would be harder to deal with
> > the differences in the execution model.
>
> Could you elaborate a bit more on these differences and related difficulties?

Simply speaking, the JIT compiler doesn't produce regular C
function prologues, since it doesn't have a need for that. Or it
limits the acceptable amount of code per function or number of
simultaneously live variables per function. These limits are fine
for Lua, but they'd need to be lifted for heavily macro-infested
C code or notorious manual loop unrolling. ;-)

> While LOCs are not a very good measure of complexity, but how big is LuaJIT 
> now?

Depends on how you'd want that to be counted against which
sub-systems. Well ... you know where to find the source. ;-)

> And what would be your estimate (either in number of LOCs or percentage of 
> code to be changed/added to the core LuaJIT) for retargeting it for a new 
> language/execution model, e.g. for some examples mentioned in my original 
> message?

Sorry, but due to the many factors in such a calculation the error
margin is too high that I'd dare to give even a conservative
estimate.

> OK. Does it mean that it is possible to implement e.g. the usual stack frames 
> a-la C/Pascal for keeping the local variables there?

That's not how LuaJIT (or any modern C compiler) works. C stack
slots are allocated when the register allocator needs to spill
values, not variables. There's no direct correspondence between
variables and stack slots.

> It is possible to implement "address of" operator and pass variable's address 
> to other functions?

Sure, if the semantics of the source language allow that. Such a
feature is well known to limit optimization opportunities for the
compiler. But this is in no way specific to a trace compiler.
Actually, alias analysis is much easier on isolated traces, so
this should work out fine.

> Is it possible to implement most of C low-level tricks (bit operations, 
> unions/structs, type casts, etc)? So, basically it should be possible to 
> build a full tracing JIT for C using LuaJIT?

The FFI and the bit library allows most of these ops. The ones
missing are still present in the IR or could easily be added.

> >>  - something like JVM-based languages, e.g. Java, Scala? You said yourself
> > that LuaJIT beats JVM in many cases.
> >
> > The core challenge is proper implementation of the execution model
> > wrt. concurrency or the GC. And some specific optimizations for
> > allocations, since Java programs tend to allocate temporaries like
> > there's no tomorrow (collateral damage from the language and
> > library design).
>
> Interesting. Do you mean that a not-so-efficient implementation is possible 
> with a moderate effort, but an efficient one with a good GC would really be a 
> challenge?

The runtime environment is a big part of the Java language
specification. The shared-everything model does hurt. Read about
the Java memory model and how they had to refine it several times
to get somewhat sane semantics a compiler could follow.

> But my question is: Do you say that LuaJIT could be tweaked with a reasonable 
> effort to replace PyPy by providing a full tracing JIT for Python?

Possibly. Not that I'd be interested in working on that, though.

> I see. But I'm wondering if there is anything in JVM or e.g. Python or Ruby 
> which cannot be easily mapped to or expressed with LuaJIT? I.e. do know any 
> examples where certain core data types, data structures or may be certain 
> language constructs/features/low-level details cannot be principally 
> expressed using LuaJIT and should be modeled only rather inefficiently, using 
> some workarounds.

There are plenty that cannot be easily or efficiently modeled in
the source language of LuaJIT plus its extensions. OTOH modeling
them on top of the IR (with some changes) is certainly feasible.

> - How modular is LuaJIT when it comes to retargeting it for a different 
> language or runtime? How configurable is it? For example, are Lua-specific 
> optimizations and other implementation details provided in a few well-defined 
> places or are they spread all over the place?

The code base is simply not designed for that.

> - Are most of the things which heavily depend on the language semantics 
> implemented in mostly orthogonal ways or are they very deeply 
> inter-dependent? How easy/difficult is it to replace implementation of one of 
> such features with an alternative implementation? E.g. if one would like to 
> add a new core data type (e.g. C-like array or complex numbers,

LuaJIT already has C-like arrays and complex numbers. :-)

> or a different implementation of tables/dicts, which is not compliant with 
> Lua's tables)?

That would be harder. You could decompose it into low-level ops or
add new mid-level IR instructions plus add the support for all
backends. MIR offers strictly better optimization opportunities
due to a lower semantic loss (which is why I'm using that for e.g.
Lua hash table operations).

> - I'm wondering if you ever considered making LuaJIT more generic and more 
> modular (in this sense, a bit like LLVM), so that it can serve as a basis for 
> tracing JITs for different languages and runtimes? I understand that it was 
> not the initial goal, but still... I'm not even suggesting that you should do 
> it yourself, if there is no interest from your side. But would you be in 
> favor of such a LuaJIT development direction? Could you roughly estimate the 
> effort (e.g. amount of redesign and/or refactoring) required to make LuaJIT 
> an easier and more approachable target for other languages? Would you 
> consider developing future LuaJIT changes taking these multi-language support 
> considerations into account?

I'd never have finished LuaJIT if I had not opted to write a very
Lua-specific compiler. Right now I don't have the time to
participate in developing a generic compiler framework. Good luck!

--Mike

Other related posts: