[ANN] LuaJIT Roadmap 2012/2013

LuaJIT Roadmap 2012/2013
************************

This is the LuaJIT roadmap for 2012/2013, bringing you up to date
on the current and future developments around LuaJIT.

I'm happy to answer your questions here on the LuaJIT mailing list,
on related news aggregators or by mail.

* Status of LuaJIT 2.0, new features and release planning
* Plans for LuaJIT 2.1, new garbage collector and other features
* Call for Sponsors


LuaJIT 2.0
==========

Current status
--------------

Overall, the LuaJIT 2.0 code base is in good shape to become
stable, soon. The beta releases are already used in production by
many developers and in many different projects.

LuaJIT 2.0 has grown quite a few more architectural ports than
expected in the last roadmap from 2011. But this is a good thing:
developers get to use a stable VM for their target architectures
*right now*. And it gives me more leeway to introduce some major
changes to the next version.

LuaJIT 2.0 already runs on all major operating systems. Soon,
it'll support close to a dozen architectures or architectural
variations. This pretty much covers the complete desktop and
server markets, almost all of the smartphone market and a sizeable
chunk of the 32 bit embedded CPU market, too. Coverage will become
even better over time, due to expected market shake-outs.

LuaJIT is widely considered to be one of the fastest dynamic
language implementations. It features a compact, innovative
top-of-the-line just-in-time (JIT) compiler.

The integrated LuaJIT FFI library is a major additional benefit:
it largely obviates the need to write tedious manual bindings with
the classic Lua/C API. There's no need to learn a separate binding
language -- it parses plain C declarations! The JIT compiler is
able to generate code on par with a C compiler for access to
native C data structures. Calls to C functions can be inlined in
JIT-compiled code.

LuaJIT 2.0 has extensive architecture-specific and OS-specific
customizations. This, together with excellent cross-compilation
support, makes LuaJIT an ideal tool for developers who need to
embed a nearly universally portable, light-weight *and* high-speed
dynamic VM into their projects.

[Phew! Enough of the marketing speak for now ... ;-) ]

What's next
-----------

Now that LuaJIT 2.0.0-beta10 is out, a couple of reorganizations
will happen in the source tree. After that, one new optimization
and two new ports will be added.

These are (probably) the last major changes to LuaJIT 2.0 before
the final (non-beta) release. All other planned features will have
to wait for LuaJIT 2.1.

Addition of a minified Lua interpreter
--------------------------------------

A customized, heavily stripped and minimized Lua interpreter will
be included to assist the build process. This weighs in at only
173 KB, or 45 KB compressed. It'll be compiled first during the
build process (as a host executable when cross-compiling).

The first use case is to run DynASM. This allows generating the
machine-specific files for the current target architecture at
build time. Which in turn allows the removal of various
pre-translated files.

The addition of a minimal Lua interpreter opens up more options
for customizing and simplifying the build process in the future.
E.g. most of the C code, that's only used at build time, can be
replaced with Lua code.

The program to generate the (mostly illegible) minified C source
code for the Lua interpreter will be included. Security-conscious
people can check that it generates identical output, given the
original Lua sources. Or they may use the standard Lua 5.1/5.2
interpreter for the build process (build option).

Removal of pre-generated buildvm_${arch}.h files
------------------------------------------------

The pre-generated, architecture-specific files buildvm_${arch}.h
contain the LuaJIT interpreter-generator for each architecture,
ready for consumption by a C compiler to generate the 'buildvm'
executable. The actual sources are in the buildvm_${arch}.dasc
files.

The assembler source code of the interpreter needs to be translated
with DynASM, which is a Lua program. To avoid a chicken-and-egg
situation, those files had to be shipped pre-generated.

Due to the proliferation of architectures and architectural
variations, the pre-generated files have already grown to 844 KB.
Compressed, this adds only 133 KB to the released tar.gz files,
but that's still too much. And more is to come.

Also, even a single-line change in one of the *.dasc files
triggers lots of changes in the corresponding *.h file. This
causes needlessly big commits in the git repository.

The addition of a minified Lua interpreter solves this problem:
the pre-generated buildvm_${arch}.h files can be removed. Only
the output file for the selected target architecture will be
translated with DynASM at build time, utilizing the minified Lua
interpreter. Many more architectural variations can now be added
with no concern over the size of the intermediate *.h files.

In case you're following the git repository: it's recommended that
you do a 'make cleaner' [sic!] to clean up your build tree, right
after the big commits for this change arrive. It should still work
without that step, though.

Move lib/* to src/jit/*
-----------------------

The JIT-compiler-specific Lua modules currently shipped in lib/*
need to be installed in the package path, relative to a 'jit'
directory, before they can be used.

To allow testing of the un-installed command line executable from
within the 'src' directory, the modules will be moved to src/jit/*.
Other hierarchies (e.g. src/ffi/*) may be added in the future.

The 'install' target of the top-level Makefile will of course be
adjusted accordingly. Watch out if you've modified this file or
if you've automated the install process with other tools.

New optimization: Allocation sinking and store sinking
------------------------------------------------------

A corporate sponsor, who wishes to remain anonymous, has sponsored
the development of allocation sinking and store sinking
optimizations for LuaJIT.

Avoiding temporary allocations is an important optimization for
high-level languages. LuaJIT already eliminates many of these with
multiple techniques: e.g. floating-point numbers aren't boxed and
the JIT compiler eliminates allocations for most immutable
objects. Alas, traditional techniques to avoid the remaining
allocations (escape analysis and scalar replacement of aggregates)
are ineffective for dynamic languages.

The goal of this sponsorship is to research the combination of
store-to-load-forwarding (already implemented) with store sinking
and allocation sinking (to be implemented). This innovative
approach is highly effective in avoiding temporary allocations in
the fast paths, even under the presence of many slow paths where
the temporary object may escape to. This approach is most
effective for dynamic languages, but may be successfully applied
elsewhere, when the classic techniques fail.

Work for this feature is currently in progress.

New port: ARM VFP support and hard-float EABI support
-----------------------------------------------------

A corporate sponsor, who wishes to remain anonymous, has sponsored
the VFP support (hardware FPU) and the hard-float EABI support for
the ARM port. After that work is complete, the ARM port of LuaJIT
can be built for three different CPU/ABI combinations:

* ARMv5+, soft-float EABI, soft-float FP operations (already exists)
* ARMv6+, soft-float EABI, VFPv2+ FP operations
* ARMv6+, hard-float EABI, VFPv2+ FP operations (e.g. Debian armhf)

Work on the VFP support and hard-float support for the ARM port is
scheduled for Q3 2012.

New port: PPC32on64 interpreter for PS3 and XBox 360
----------------------------------------------------

Current-generation consoles based on PowerPC CPUs cannot run the
existing PPC port of LuaJIT. Several changes are needed:

* The JIT compiler must be disabled for the consoles, as the
  hypervisors do not allow execution of code generated at runtime.

* Changes to the LuaJIT interpreter to run as a 32 bit program on
  PPC64 (PPC32on64). Registers are 64 bit wide, even though
  pointers are still 32 bit. This affects e.g. the carry bit and
  pointer addressing. The assembler code needs to be adapted.

* Some common PPC instructions are micro-coded on the console CPUs,
  which causes unwanted slow-downs. These instructions need to be
  replaced with other instruction sequences.

* Support for modified calling conventions.

These changes allow embedding the LuaJIT 2.0 interpreter in PS3
or XBox 360 projects, with a substantial speedup compared to the
standard Lua 5.1 interpreter.

The console ports will be integrated some time after the build
process reorganizations are complete.

Minor new features
------------------

The following minor features are on my TODO list for LuaJIT 2.0:

- Add 'goto' statement and labels, compatible with Lua 5.2.

  This feature will also be available from the Lua 5.1 mode of
  LuaJIT 2.0, where 'goto' is not a keyword. The parser figures
  out whether it's a variable name or a statement.

- Support '%a' and '%A' for string.format and parse hexadecimal
  floating-point numbers (0x1.2a7p9 => 596.875) independent of the
  C99-conformance of the C library (works even with MSVCRT).

- Other Lua 5.2-compatibility features:

  Return result status for os.execute() and pipe close.
  Support extra format specifiers for io.lines() and fp:lines().

Feature freeze
--------------

After the above features have been implemented, beta11 will be
released and a feature freeze will be announced: no new features
will be accepted into the LuaJIT 2.0 code base.

Bug fixes to existing features will always be accepted, of course.

I'm willing to make small concessions for the FFI library, as it's
relatively young. Minor upwards-compatible features, that are
important for usability, might make it into the code base, even
after the feature freeze (e.g. backports from LuaJIT 2.1).

Release plans
-------------

After the feature freeze and a concerted cleanup effort, several
release candidates and the final 2.0.0 release will be put out.

My goal is to complete all of this before the end of 2012.

Bug fixes will be accumulated in the git repository, as usual. New
dot releases (2.0.x), which include all of these fixes, will be
made available at irregular intervals.

I'm planning to give LuaJIT 2.0 LONG-TERM SUPPORT, provided
there's sufficient interest in the community and continued
sponsorship. The LuaJIT 2.0 release will likely be maintained and
supported for several years. It will be updated to fix future
incompatibilities, e.g. with new toolchain or OS releases.


LuaJIT 2.1
==========

After LuaJIT 2.0 has become stable, work on LuaJIT 2.1 may begin.
This section is intended to give you a short overview of my plans
for LuaJIT 2.1.

Compatibility
-------------

A new release is always a good point to do some cleanup. LuaJIT
has accumulated quite a bit of slack during the 2.0 development
phase. And some of that has to go, e.g. the x87-compatibility in
the interpreter for x86 CPUs without SSE2. Other features planned
for removal will be announced in a separate message, before work
on LuaJIT 2.1 starts.

But there's one important message: compatibility with Lua 5.1 is
there to stay!

Many users of LuaJIT, especially those with big code bases, have a
heavy investment in Lua 5.1-compatible infrastructure, tools,
frameworks and in-house knowledge. Understandably, they don't want
to throw away their investment, but still keep up with the newest
developments.

As I've previously said, Lua 5.2 provides few tangible benefits.
LuaJIT already includes the major new features, without breaking
compatibility. Upgrading to be compatible with 5.2, just for the
sake of a higher version number, is neither a priority nor a
sensible move for most LuaJIT users.

To protect the investment of my users and still provide them with
new features, LuaJIT 2.1 will stay compatible with Lua 5.1.

New garbage collector
---------------------

The garbage collector used by LuaJIT 2.0 is essentially the same
as the Lua 5.1 GC. The current garbage collector is relatively
slow compared to implementations for other language runtimes. It's
not competitive with top-of-the-line GCs, especially for large
workloads.

The main innovation in LuaJIT 2.1 is a complete redesign of the
garbage collector from scratch: the new garbage collector will be
an arena-based, quad-color incremental, generational, non-copying,
high-speed, cache-optimized garbage collector.

You can read more about the design of the new GC here:

  http://wiki.luajit.org/New-Garbage-Collector

Note: this page is a work-in-progress! More details will be added
and the gaps will be filled in over time.

Planned features
----------------

Based on recognized needs and suggestions from LuaJIT users, here
are some other features, that I'd like to work on. Hopefully, many
of them will make it into LuaJIT 2.1 or future versions.

The list is in no particular order:

- Metatable/__index specialization

  Accesses to metatables and __index tables with constant keys are
  already specialized by the JIT compiler to use optimized hash
  lookups (HREFK). This is based on the assumption that individual
  objects don't change their metatable (once assigned) and that
  neither the metatable nor the __index table are modified. This
  turns out to be true in practice, but those assumptions still
  need to be checked at runtime, which can become costly for
  OO-heavy programming.

  Further specialization can be obtained by strictly relying on
  these assumptions and omitting the related checks in the
  generated code. In case any of the assumptions are broken (e.g.
  a metatable is written to), the previously generated code must
  be invalidated or flushed.

  Different mechanisms for detecting broken assumptions and for
  invalidating the generated code should be evaluated.

  This optimization works at the lowest implementation level for
  metatables in the VM. It should equally benefit any code that
  uses metatables, not just the typical frameworks that implement
  a class-based system on top of it.

- Value-range propagation (VRP)

  Value-range propagation is an optimization for the JIT compiler:
  by propagating the possible ranges for a value, subsequent code
  may be optimized or conditionals may be eliminated. Constant
  propagation (already implemented) can be seen as a special case
  of this optimization.

  E.g. if a number is known to be in the range 0 <= x < 256 (say
  it originates from string.byte), then a later mask operation
  bit.band(x, 255) is redundant. Similarly, a subsequent test for
  x < 0 can be eliminated.

  Note that even though few programmers would explicitly write
  such a series of operations, this can easily happen after
  inlining of functions combined with constant propagation.

- Hyperblock scheduling

  Producing good code for unbiased branches is a key problem for
  trace compilers. This is the main cause for "trace explosion"
  and bad performance with certain types of branchy code.

  Hyperblock scheduling promises to solve this nicely at the price
  of a major redesign of the compiler: selected traces are woven
  together to a single hyper-trace. This would also pave the way
  for emitting predicated instructions, which benefits some CPUs
  (e.g. ARM) and is a prerequisite for efficient vectorization.

- FFI C pre-processor

  The integrated C parser of the FFI library currently doesn't
  support #define or other C pre-processor features. To support
  the full range of C semantics, an integrated C pre-processor is
  needed.

  This would provide a nice solution to the C re-declaration
  problem for FFI modules, too.

- Partial C++ support for the FFI

  Full C++ support for the FFI is not feasible, due to the sheer
  complexity of the task: one would need to write more or less a
  complete C++ compiler.

  However, a limited number of C++ features can certainly be
  supported. Of course, one could argue, anything but full support
  doesn't make sense. But you'll never know, unless you try ...

  It would be an interesting task to evaluate what subset of C++
  can be supported with reasonable effort or which C++ libraries
  can be successfully bound via the FFI. Basically: how far can
  C++ support go, how much effort would be needed and does it
  really pay off in practice?

  Such a project should be split into the evaluation phase and an
  implementation phase, which implements the C++ subset, based on
  the prior evaluation.

- User-definable intrinsics for the FFI

  This is a low-level equivalent to GCC inline assembler: given a
  C function declaration and a machine code template, an intrinsic
  function (builtin) can be constructed and later called. This
  allows generating and executing arbitrary instructions supported
  by the target CPU. The JIT compiler inlines the intrinsic into
  the generated machine code for maximum performance.

  Developers usually shouldn't need to write machine code templates
  themselves. Common libraries of intrinsics for different purposes
  should be provided or contributed by experts.

- Vector/SIMD data type support for the FFI

  Currently, vector data types may be defined with the FFI, but
  you really can't do much with them. The goal of this project is
  to add full support for vector data types to the JIT compiler
  and the CPU-specific backends (if the target CPU has a vector
  extension).

  A new "ffi.vec" module declares standard vector types and
  attaches the machine-specific SIMD intrinsics as (meta)methods.

  Prerequisites for this project are allocation sinking, the
  user-definable intrinsics and the new garbage collector.

  More about the last two features can be read here:
    http://lua-users.org/lists/lua-l/2012-02/msg00207.html

Most of these features are still in an early planning stage. I'm
sure the community will come up with many more interesting ideas.
Which of these will become a reality depends on the interest in
the community and on sponsorships (see below).


Call for Sponsors
=================

First, I'd like to say a BIG THANK YOU to all LuaJIT sponsors!

Almost all of the recent work on LuaJIT 2.0 has been sponsored by
various corporate sponsors. The full track record is here:

  http://luajit.org/sponsors.html

All of those architectural ports and new features wouldn't have
been possible without your sponsorships!

I think this sends a happy message to the greater open source
community: the open source development model *does* work out and
it can be a sustainable (side) business for its creators!

Nonetheless, I have to look forward: as you've seen above, I've
got big plans with LuaJIT 2.1. In fact, the plans are so big that
I fear it may be hard to get enough sponsorships to cover just the
work on the one major features, the new garbage collector.

For LuaJIT 2.0, the ports to the various architectures made most
of the money. The companies sponsoring them had a genuine, often
urgent, business need for these ports. Sadly, this source is
drying up, as the major architectures are well covered.

The new garbage collector is certainly a desirable feature and
IMHO the correct next evolutionary step for LuaJIT. Alas,
developers have learned to work around the deficiencies of the
current GC (by carefully avoiding allocations). The benefits of a
new garbage collector are hard to quantify, without actually
implementing it. And that's *a lot of work*, which makes it not
exactly cheap. Maybe too expensive for a single company. It'll be
a tough sell in any case.

So far, I've relied exclusively on corporate sponsorships for
various legal and administrative reasons. Ok, so the recent trend
towards crowd funding got me thinking ...

But let's be realistic: the Lua community is small, the LuaJIT
community is even smaller -- it's growing fast, though. I simply
don't know whether it's possible to gather enough people and
enough money to finance the continued development of LuaJIT.

And there's another issue: to me, it looks like the whole crowd
funding idea is rapidly deteriorating into an arms race of
marketing experts. So many people are jumping on that bandwagon
now ... you'll never make it, unless you permanently stay on the
front pages somehow.

Alas, I'm not good at marketing and a garbage collector is a very
technical and *very* unsexy project (for most people, anyway).
But then, I'd really love to be proven wrong ...

To be fair, I have to make this statement:

I'd really like to work on LuaJIT and I'd like to continue shaping
it's future. However, I fear, without sponsorships I'd have to do
more work as a consultant (in unrelated jobs). That doesn't leave
me enough spare time to do a significant amount of work on LuaJIT.

Therefore, I cannot start working on LuaJIT 2.1, before I've got
full covenants for a) maintaining two major code bases, b) the
ground work to clean up the code base and prepare it for c) the
work on the new garbage collector for LuaJIT 2.1.

I estimate this to be worth on the order of EUR 80K+ ($100K+),
only for the near future after the release of LuaJIT 2.0.

We're not in a hurry, though. I'd like to publicly discuss all
options thoroughly with the LuaJIT community and beyond. I'll open
a new topic on the LuaJIT mailing list right after this posting.

If you require anonymity, please write to me by mail, see:
  http://luajit.org/sponsors.html

Thank you!


[Important note: please do NOT send money, checks or anything like
that to me at this time! If there's a crowd funding effort or a
corporate funding pool, this will be announced separately.]

--Mike

Other related posts: