Re: To which extent LuaJIT is specific to Lua

  • From: Ryan <rymg19@xxxxxxxxx>
  • To: luajit@xxxxxxxxxxxxx,Leo Romanoff <romixlev@xxxxxxxxx>
  • Date: Fri, 29 Nov 2013 14:18:40 -0600

In reference to your last question: ever heard of PyPy/RPython?

Leo Romanoff <romixlev@xxxxxxxxx> wrote:
>Mike, first of all, thanks a lot for your very insightful answers!
>
>> Mike Pall <mike-1311@xxxxxxxxxx> schrieb am 19:15 Freitag,
>29.November 2013:
>
>> > Leo Romanoff wrote:
>>>  Is it possible to use (a correspondingly extended) LuaJIT as a
>generic 
>> tracing JIT for other languages with different semantics and
>execution models? 
>> Or is LuaJIT so tightly coupled to Lua, its semantics and execution
>model that 
>> it is almost impossible to reuse it for something significantly
>different from 
>> Lua without rewriting it almost completely?
>> 
>> You can certainly adapt the overall design and take many parts of
>> it. Depends a bit on how close the language is to Lua.
>
>Yes. I suspected this ;-)
>
>>>  I understand that LuaJIT was initially created specifically for
>Lua. But 
>> I'm wondering to which extent LuaJITs inner organization, design and 
>> implementation are tied to the semantics of Lua. I can imagine that
>some parts 
>> of LuaJIT (e.g. machine code generation, register allocation, some
>generic 
>> optimizations) are not that much dependent on Lua and its semantics,
>while some 
>> other parts (e.g. some of optimizations) are rather tightly coupled
>to Lua, 
>> because they are only possible if Lua semantics is assumed (e.g.
>table indexing 
>> starts with 1, strings are interned, etc).
>> 
>> Two key design decision that make it difficult to reuse all of the
>> code is the medium-level IR and the stack snapshots. These reflect
>> both the semantics of Lua (e.g. hash tables) and its execution
>> model.
>
>OK. I'm wondering how much your tracing infrastructure implementation
>is dependent on Lua semantics and its execution model? I.e. could the
>tracing implementation mostly be reused (e.g. taking traces, detecting
>what and where should be traced, storing traces, purging traces, etc)?
>Obviously certain parts of this are almost always
>language/runtime/execution model dependent, but I think that a lot of
>things are also pretty generic, or?
>
>> OTOH the more recent work, e.g. for the FFI, builds upon lower-level
>> parts of the IR. It would be entirely possible to generate IR for
>> a C-like language right now, but it would be harder to deal with
>> the differences in the execution model.
>
>Could you elaborate a bit more on these differences and related
>difficulties?
>
>> Interestingly, this is the
>> opposite of what you'd be facing when trying to retarget (say)
>> LLVM to a dynamic language, because it makes various assumptions
>> about the execution model that are geared towards compiling C/C++.
>
>Indeed. I see your point.
>BTW, speaking of LLVM I'd like to mention that it is pretty easy to
>experiment with it, extend it, etc.
>It is very modular and most of important concerns are clearly separated
>and abstracted. I'm not saying that it produces better results than
>LuaJIT. I'm just saying that it was developed with extensibility,
>modularity and configurability in mind, which is a good thing when it
>comes to developing new features on top of it.
>
>But see my related questions at the end of this mail.
>
>>>  So, I'm wondering which parts of LuaJIT are generic and which ones
>are 
>> tightly based on Lua semantics? The reason for this question: I'd
>like to 
>> understand better if LuaJIT could be used as a tracing JIT backend
>for something 
>> very different from Lua in its semantics. I understand that it is
>most likely 
>> not possible out of the box and would require adaptations and
>extensions of 
>> LuaJIT. But the question is - how much work would it be? Would one
>need to 
>> rewrite almost all of LuaJIT or may be there are only few places in
>the code 
>> that are very much dependent on the language semantics and therefore
>need to be 
>> adjusted/changed to meet the semantics of a different language? And
>I'm 
>> really in using LuaJIT directly, without mapping the original source
>language to 
>> Lua first, even though it could be possible in some cases.
>> 
>> I'm not sure I could quantify this.
>
>While LOCs are not a very good measure of complexity, but how big is
>LuaJIT now? And what would be your estimate (either in number of LOCs
>or percentage of code to be changed/added to the core LuaJIT) for
>retargeting it for a new language/execution model, e.g. for some
>examples mentioned in my original message?
>
>I totally understand that counting LOCs in such a complex piece of
>software like a compiler or JIT is not a very good approximation, as
>every line has more complexity than 100s of lines in an average
>application. I know it very well, because I've developed optimizing C
>compilers in my former life. But due to this experience I can roughly
>translate "compiler LOCs" into the required effort. This is why I ask
>about it. Actually, my former experience is also the reason why I
>started this thread. As an expert in the compiler development area, I
>have a very deep respect for your work on LuaJIT. IMHO, it is really a
>pity that this wonderful technology is currently used only for Lua. I
>think with some adaptations/changes it may be applied in a much broader
>area with a great success.
>
>
>> Maybe ask Thomas Schilling. He
>> wrote a trace compiler for Haskell whose code is based on the core
>> parts of the LuaJIT interpreter and trace compiler. Heavily
>> modified, of course:
>> 
>>
>http://cp.reddit.com/r/haskell/comments/1r4s7b/what_happened_to_the_tracing_jit_work_by_thomas/
>
>Interesting. I've never heard about this attempt. I'll have a look at
>it. 
>
>>>  Some examples of the languages and features I have in mind are:
>>>  - some sort of statically typed language like C/Pascal/etc
>> 
>> Entirely possible.
>
>OK. Does it mean that it is possible to implement e.g. the usual stack
>frames a-la C/Pascal for keeping the local variables there? It is
>possible to implement "address of" operator and pass variable's address
>to other functions? Is it possible to implement most of C low-level
>tricks (bit operations, unions/structs, type casts, etc)? So, basically
>it should be possible to build a full tracing JIT for C using LuaJIT?
>
>>>  - something like JVM-based languages, e.g. Java, Scala? You said
>yourself 
>> that LuaJIT beats JVM in many cases.
>> 
>> The core challenge is proper implementation of the execution model
>> wrt. concurrency or the GC. And some specific optimizations for
>> allocations, since Java programs tend to allocate temporaries like
>> there's no tomorrow (collateral damage from the language and
>> library design).
>
>Interesting. Do you mean that a not-so-efficient implementation is
>possible with a moderate effort, but an efficient one with a good GC
>would really be a challenge? 
>
>>>  - some scripting languages, which have a different internal 
>> object/execution model, e.g. Python or Ruby
>> 
>> Sure, Python has PyPy. 
>
>Sure. I understand that there is a JIT for Python, so it is possible to
>write a (tracing) JIT for it. 
>But my question is: Do you say that LuaJIT could be tweaked with a
>reasonable effort to replace PyPy by providing a full tracing JIT for
>Python? 
>
>> The difficulties you'll be facing there
>> have more to do with legacy C interfaces, the abundance of
>> specialized core data types or wasteful execution semantics. One
>> pays for abstractions, one way or another.
>
>I see. But I'm wondering if there is anything in JVM or e.g. Python or
>Ruby which cannot be easily mapped to or expressed with LuaJIT? I.e. do
>know any examples where certain core data types, data structures or may
>be certain language constructs/features/low-level details cannot be
>principally expressed using LuaJIT and should be modeled only rather
>inefficiently, using some workarounds.
>
>>>  - features like: custom object layouts in memory (e.g. C-like
>structs vs 
>> Lua's tables), custom garbage collectors, support for custom ABIs.
>> 
>> Code reuse should be easy if you base it on the same FFI design.
>> 
>> Custom garbage collectors are troublesome for any VM or compiler.
>> Sounds nice on paper and certainly interesting for research. But
>> IMHO only a fully integrated GC offers top performance.
>
>Yes. I understand that GC is very tightly coupled with the language and
>its runtime.
>
>Based on your responses, I'd also like to ask the following questions:
>
>- How modular is LuaJIT when it comes to retargeting it for a different
>language or runtime? How configurable is it? For example, are
>Lua-specific optimizations and other implementation details provided in
>a few well-defined places or are they spread all over the place? 
>
>- Are most of the things which heavily depend on the language semantics
>implemented in mostly orthogonal ways or are they very deeply
>inter-dependent? How easy/difficult is it to replace implementation of
>one of such features with an alternative implementation? E.g. if one
>would like to add a new core data type (e.g. C-like array or complex
>numbers, or a different implementation of tables/dicts, which is not
>compliant with Lua's tables)?
>
>- I'm wondering if you ever considered making LuaJIT more generic and
>more modular (in this sense, a bit like LLVM), so that it can serve as
>a basis for tracing JITs for different languages and runtimes? I
>understand that it was not the initial goal, but still... I'm not even
>suggesting that you should do it yourself, if there is no interest from
>your side. But would you be in favor of such a LuaJIT development
>direction? Could you roughly estimate the effort (e.g. amount of
>redesign and/or refactoring) required to make LuaJIT an easier and more
>approachable target for other languages? Would you consider developing
>future LuaJIT changes taking these multi-language support
>considerations into account?
>
>Thanks,
>  -Leo

-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

Other related posts: