Re: Make the VM Lua-version-agnostic and modularize

  • From: "Vyacheslav Egorov" <dmarc-noreply@xxxxxxxxxxxxx> (Redacted sender "vegorov" for DMARC)
  • To: luajit@xxxxxxxxxxxxx
  • Date: Thu, 24 Sep 2015 18:56:32 +0200

TValues can fit doubles. Surely they CAN fit int64s as well.

No.

TValue in LuaJIT is a 8-byte sized value that uses NaN-tagging[1] to
fit both doubles
and other tagged values into these 8 bytes without wasting any additional space
for a separate type tag. This only works because of how doubles work - anything
with exponent 0x7ff and non-zero mantissa is a NaN - hence you can
"hide" anything
that fits into 52 bits inside this NaN-space.

This does not work for int64 - because to represent all int64 values
you need *all* 64-bits.

TValue in Lua is a larger structure - because it has payload and
type-tag separated.

The registry would have 3 "global environments" instead of 1: one for Lua
5.1, another for Lua 5.2, another for Lua 5.3.

I do consider what you describe here as complexity and additional
overhead. Maybe we have different definitions of "complexity" and
"overhead".

Also one thing we didn't even touc is API versioning.

I find it hard to imagine a sane way to fit three not entirely
compatible APIs into the same VM.

Yeah, you could do some stuff with preprocessor but no that's not a sane way.

[1]
https://github.com/LuaJIT/LuaJIT/blob/52ea1a30afc204553c99126ab43c2b16f2bd0182/src/lj_obj.h#L222-L256

// Vyacheslav Egorov


On Thu, Sep 24, 2015 at 6:39 PM, Soni L. <fakedme+lj@xxxxxxxxx> wrote:


On 24/09/15 11:12 AM, Vyacheslav Egorov (Redacted sender vegorov for DMARC)
wrote:

I'd say making a VM that supports multiple incompatible language versions
at the
same time is a really wrong way to do this.

You have conditions checking "runtime version" scattered across the code,
complicating any and all attempts to reason about semantics.

Additionally bytecode space (and interpreter code size) is rather finite
resource and wasting it on supporting version specific bytecodes does not
feel
right either.

For 5.3 one of the challenges is figuring out the way to deal with integer
values.

You can't fit the whole int64 range into the TValue - so if you want to
keep
TValue representation you'll have to box it (similar to what FFI does) and
hope
that JIT manages to sink the boxing. However performance of the
interpreter will
drop.

You can start boxing "upper" part of the int64 range keeping "middle" part
that
fits into TValue unboxed - however this introduces complexity and still
leads to
unexpected performance characteristics because some operations just tend
to
produce out of range int64 values (e.g. bitwise manipulation is a common
culprit).

To make things worse: for performance reasons LuaJIT already has a
DUALNUM mode -
in which it tries to keep floating point numbers from int32 range
represented as
int32 values - now if you want a VM that supports 5.1 and 5.3 and performs
well
for both on ARM you'll find yourself in the spot where you suddenly
have floating point
values represented as either floating point value or an integer and
you also have
integers represented as either boxed or unboxed integer.

This is the level of complexity we are talking about here and it is
not desirable.

// Vyacheslav Egorov

TValues can fit doubles. Surely they CAN fit int64s as well.

At most you'll need bitop opcodes, a new LEN opcode that respects __len
(emitted by the Lua 5.2 and Lua 5.3 parser/loader), maybe opcodes for the
Lua 5.2 gt/lt/le/ge (which call metamethods even for different types), and
new stuff on the stdlib (require"5.2" would return a table similar to the
default _G, but with some things added (such as table.(un)pack) and others
removed (such as getfenv/setfenv) and others replaced (such as pairs/ipairs
which would respect __pairs/__ipairs, etc). Even assuming the opcodes don't
have a free argument, that's only very few opcodes!

The registry would have 3 "global environments" instead of 1: one for Lua
5.1, another for Lua 5.2, another for Lua 5.3.

Most functions can be unified, e.g. you don't need 3 separate print()s, just
use the same one!
Even the parser can be unified! Just pass some function pointers around!
(e.g. pointer for the operation parsers (Lua 5.3), pointer for the LEN
opcode emitter (Lua 5.2 and 5.3), etc; alternatively use flags, altho they
tend to be a bit less maintainable...)

From what I'm seeing there's not much added complexity: everything is
(mostly) already in.


Other related posts: