Mike, Thanks a lot for all your comments, explanations and clarifications on this subject! I'll have a closer look at LuaJIT's implementation and eventually come back with more specific questions, if you don't mind. Thanks again, -Leo > Mike Pall <mike-1311@xxxxxxxxxx> schrieb am 22:44 Freitag, 29.November 2013: > > Leo Romanoff wrote: >> OK. I'm wondering how much your tracing infrastructure implementation > is dependent on Lua semantics and its execution model? I.e. could the tracing > implementation mostly be reused (e.g. taking traces, detecting what and where > should be traced, storing traces, purging traces, etc)? Obviously certain > parts > of this are almost always language/runtime/execution model dependent, but I > think that a lot of things are also pretty generic, or? > > You may have to adapt it to the predominant control structures of > the source language and re-tune the heuristics. But, yes, most of > it is quite generic. > >> > OTOH the more recent work, e.g. for the FFI, builds upon lower-level >> > parts of the IR. It would be entirely possible to generate IR for >> > a C-like language right now, but it would be harder to deal with >> > the differences in the execution model. >> >> Could you elaborate a bit more on these differences and related > difficulties? > > Simply speaking, the JIT compiler doesn't produce regular C > function prologues, since it doesn't have a need for that. Or it > limits the acceptable amount of code per function or number of > simultaneously live variables per function. These limits are fine > for Lua, but they'd need to be lifted for heavily macro-infested > C code or notorious manual loop unrolling. ;-) > >> While LOCs are not a very good measure of complexity, but how big is LuaJIT > now? > > Depends on how you'd want that to be counted against which > sub-systems. Well ... you know where to find the source. ;-) > >> And what would be your estimate (either in number of LOCs or percentage of > code to be changed/added to the core LuaJIT) for retargeting it for a new > language/execution model, e.g. for some examples mentioned in my original > message? > > Sorry, but due to the many factors in such a calculation the error > margin is too high that I'd dare to give even a conservative > estimate. > >> OK. Does it mean that it is possible to implement e.g. the usual stack > frames a-la C/Pascal for keeping the local variables there? > > That's not how LuaJIT (or any modern C compiler) works. C stack > slots are allocated when the register allocator needs to spill > values, not variables. There's no direct correspondence between > variables and stack slots. > >> It is possible to implement "address of" operator and pass > variable's address to other functions? > > Sure, if the semantics of the source language allow that. Such a > feature is well known to limit optimization opportunities for the > compiler. But this is in no way specific to a trace compiler. > Actually, alias analysis is much easier on isolated traces, so > this should work out fine. > >> Is it possible to implement most of C low-level tricks (bit operations, > unions/structs, type casts, etc)? So, basically it should be possible to > build a > full tracing JIT for C using LuaJIT? > > The FFI and the bit library allows most of these ops. The ones > missing are still present in the IR or could easily be added. > >> >> - something like JVM-based languages, e.g. Java, Scala? You said > yourself >> > that LuaJIT beats JVM in many cases. >> > >> > The core challenge is proper implementation of the execution model >> > wrt. concurrency or the GC. And some specific optimizations for >> > allocations, since Java programs tend to allocate temporaries like >> > there's no tomorrow (collateral damage from the language and >> > library design). >> >> Interesting. Do you mean that a not-so-efficient implementation is possible > with a moderate effort, but an efficient one with a good GC would really be a > challenge? > > The runtime environment is a big part of the Java language > specification. The shared-everything model does hurt. Read about > the Java memory model and how they had to refine it several times > to get somewhat sane semantics a compiler could follow. > >> But my question is: Do you say that LuaJIT could be tweaked with a > reasonable effort to replace PyPy by providing a full tracing JIT for Python? > > Possibly. Not that I'd be interested in working on that, though. > >> I see. But I'm wondering if there is anything in JVM or e.g. Python or > Ruby which cannot be easily mapped to or expressed with LuaJIT? I.e. do know > any > examples where certain core data types, data structures or may be certain > language constructs/features/low-level details cannot be principally > expressed > using LuaJIT and should be modeled only rather inefficiently, using some > workarounds. > > There are plenty that cannot be easily or efficiently modeled in > the source language of LuaJIT plus its extensions. OTOH modeling > them on top of the IR (with some changes) is certainly feasible. > >> - How modular is LuaJIT when it comes to retargeting it for a different > language or runtime? How configurable is it? For example, are Lua-specific > optimizations and other implementation details provided in a few well-defined > places or are they spread all over the place? > > The code base is simply not designed for that. > >> - Are most of the things which heavily depend on the language semantics > implemented in mostly orthogonal ways or are they very deeply > inter-dependent? > How easy/difficult is it to replace implementation of one of such features > with > an alternative implementation? E.g. if one would like to add a new core data > type (e.g. C-like array or complex numbers, > > LuaJIT already has C-like arrays and complex numbers. :-) > >> or a different implementation of tables/dicts, which is not compliant with > Lua's tables)? > > That would be harder. You could decompose it into low-level ops or > add new mid-level IR instructions plus add the support for all > backends. MIR offers strictly better optimization opportunities > due to a lower semantic loss (which is why I'm using that for e.g. > Lua hash table operations). > >> - I'm wondering if you ever considered making LuaJIT more generic and > more modular (in this sense, a bit like LLVM), so that it can serve as a > basis > for tracing JITs for different languages and runtimes? I understand that it > was > not the initial goal, but still... I'm not even suggesting that you should > do it yourself, if there is no interest from your side. But would you be in > favor of such a LuaJIT development direction? Could you roughly estimate the > effort (e.g. amount of redesign and/or refactoring) required to make LuaJIT > an > easier and more approachable target for other languages? Would you consider > developing future LuaJIT changes taking these multi-language support > considerations into account? > > I'd never have finished LuaJIT if I had not opted to write a very > Lua-specific compiler. Right now I don't have the time to > participate in developing a generic compiler framework. Good luck! > > > --Mike >