Re: [ANN] LuaJIT Roadmap 2012/2013

  • From: François Perrad <francois.perrad@xxxxxxxx>
  • To: luajit@xxxxxxxxxxxxx
  • Date: Fri, 8 Jun 2012 11:26:22 +0200

2012/6/7 Mike Pall <mike-1206@xxxxxxxxxx>:
> LuaJIT Roadmap 2012/2013
> ************************
>
> This is the LuaJIT roadmap for 2012/2013, bringing you up to date
> on the current and future developments around LuaJIT.
>
> I'm happy to answer your questions here on the LuaJIT mailing list,
> on related news aggregators or by mail.
>
> * Status of LuaJIT 2.0, new features and release planning
> * Plans for LuaJIT 2.1, new garbage collector and other features
> * Call for Sponsors
>
>
> LuaJIT 2.0
> ==========
>
> Current status
> --------------
>
> Overall, the LuaJIT 2.0 code base is in good shape to become
> stable, soon. The beta releases are already used in production by
> many developers and in many different projects.
>
> LuaJIT 2.0 has grown quite a few more architectural ports than
> expected in the last roadmap from 2011. But this is a good thing:
> developers get to use a stable VM for their target architectures
> *right now*. And it gives me more leeway to introduce some major
> changes to the next version.
>
> LuaJIT 2.0 already runs on all major operating systems. Soon,
> it'll support close to a dozen architectures or architectural
> variations. This pretty much covers the complete desktop and
> server markets, almost all of the smartphone market and a sizeable
> chunk of the 32 bit embedded CPU market, too. Coverage will become
> even better over time, due to expected market shake-outs.
>
> LuaJIT is widely considered to be one of the fastest dynamic
> language implementations. It features a compact, innovative
> top-of-the-line just-in-time (JIT) compiler.
>
> The integrated LuaJIT FFI library is a major additional benefit:
> it largely obviates the need to write tedious manual bindings with
> the classic Lua/C API. There's no need to learn a separate binding
> language -- it parses plain C declarations! The JIT compiler is
> able to generate code on par with a C compiler for access to
> native C data structures. Calls to C functions can be inlined in
> JIT-compiled code.
>
> LuaJIT 2.0 has extensive architecture-specific and OS-specific
> customizations. This, together with excellent cross-compilation
> support, makes LuaJIT an ideal tool for developers who need to
> embed a nearly universally portable, light-weight *and* high-speed
> dynamic VM into their projects.
>
> [Phew! Enough of the marketing speak for now ... ;-) ]
>
> What's next
> -----------
>
> Now that LuaJIT 2.0.0-beta10 is out, a couple of reorganizations
> will happen in the source tree. After that, one new optimization
> and two new ports will be added.
>
> These are (probably) the last major changes to LuaJIT 2.0 before
> the final (non-beta) release. All other planned features will have
> to wait for LuaJIT 2.1.
>
> Addition of a minified Lua interpreter
> --------------------------------------
>
> A customized, heavily stripped and minimized Lua interpreter will
> be included to assist the build process. This weighs in at only
> 173 KB, or 45 KB compressed. It'll be compiled first during the
> build process (as a host executable when cross-compiling).
>
> The first use case is to run DynASM. This allows generating the
> machine-specific files for the current target architecture at
> build time. Which in turn allows the removal of various
> pre-translated files.
>
> The addition of a minimal Lua interpreter opens up more options
> for customizing and simplifying the build process in the future.
> E.g. most of the C code, that's only used at build time, can be
> replaced with Lua code.
>
> The program to generate the (mostly illegible) minified C source
> code for the Lua interpreter will be included. Security-conscious
> people can check that it generates identical output, given the
> original Lua sources. Or they may use the standard Lua 5.1/5.2
> interpreter for the build process (build option).
>
> Removal of pre-generated buildvm_${arch}.h files
> ------------------------------------------------
>
> The pre-generated, architecture-specific files buildvm_${arch}.h
> contain the LuaJIT interpreter-generator for each architecture,
> ready for consumption by a C compiler to generate the 'buildvm'
> executable. The actual sources are in the buildvm_${arch}.dasc
> files.
>
> The assembler source code of the interpreter needs to be translated
> with DynASM, which is a Lua program. To avoid a chicken-and-egg
> situation, those files had to be shipped pre-generated.
>
> Due to the proliferation of architectures and architectural
> variations, the pre-generated files have already grown to 844 KB.
> Compressed, this adds only 133 KB to the released tar.gz files,
> but that's still too much. And more is to come.
>
> Also, even a single-line change in one of the *.dasc files
> triggers lots of changes in the corresponding *.h file. This
> causes needlessly big commits in the git repository.
>
> The addition of a minified Lua interpreter solves this problem:
> the pre-generated buildvm_${arch}.h files can be removed. Only
> the output file for the selected target architecture will be
> translated with DynASM at build time, utilizing the minified Lua
> interpreter. Many more architectural variations can now be added
> with no concern over the size of the intermediate *.h files.
>
> In case you're following the git repository: it's recommended that
> you do a 'make cleaner' [sic!] to clean up your build tree, right
> after the big commits for this change arrive. It should still work
> without that step, though.
>
> Move lib/* to src/jit/*
> -----------------------
>
> The JIT-compiler-specific Lua modules currently shipped in lib/*
> need to be installed in the package path, relative to a 'jit'
> directory, before they can be used.
>
> To allow testing of the un-installed command line executable from
> within the 'src' directory, the modules will be moved to src/jit/*.
> Other hierarchies (e.g. src/ffi/*) may be added in the future.
>
> The 'install' target of the top-level Makefile will of course be
> adjusted accordingly. Watch out if you've modified this file or
> if you've automated the install process with other tools.
>
> New optimization: Allocation sinking and store sinking
> ------------------------------------------------------
>
> A corporate sponsor, who wishes to remain anonymous, has sponsored
> the development of allocation sinking and store sinking
> optimizations for LuaJIT.
>
> Avoiding temporary allocations is an important optimization for
> high-level languages. LuaJIT already eliminates many of these with
> multiple techniques: e.g. floating-point numbers aren't boxed and
> the JIT compiler eliminates allocations for most immutable
> objects. Alas, traditional techniques to avoid the remaining
> allocations (escape analysis and scalar replacement of aggregates)
> are ineffective for dynamic languages.
>
> The goal of this sponsorship is to research the combination of
> store-to-load-forwarding (already implemented) with store sinking
> and allocation sinking (to be implemented). This innovative
> approach is highly effective in avoiding temporary allocations in
> the fast paths, even under the presence of many slow paths where
> the temporary object may escape to. This approach is most
> effective for dynamic languages, but may be successfully applied
> elsewhere, when the classic techniques fail.
>
> Work for this feature is currently in progress.
>
> New port: ARM VFP support and hard-float EABI support
> -----------------------------------------------------
>
> A corporate sponsor, who wishes to remain anonymous, has sponsored
> the VFP support (hardware FPU) and the hard-float EABI support for
> the ARM port. After that work is complete, the ARM port of LuaJIT
> can be built for three different CPU/ABI combinations:
>
> * ARMv5+, soft-float EABI, soft-float FP operations (already exists)
> * ARMv6+, soft-float EABI, VFPv2+ FP operations
> * ARMv6+, hard-float EABI, VFPv2+ FP operations (e.g. Debian armhf)
>
> Work on the VFP support and hard-float support for the ARM port is
> scheduled for Q3 2012.
>
> New port: PPC32on64 interpreter for PS3 and XBox 360
> ----------------------------------------------------
>
> Current-generation consoles based on PowerPC CPUs cannot run the
> existing PPC port of LuaJIT. Several changes are needed:
>
> * The JIT compiler must be disabled for the consoles, as the
>  hypervisors do not allow execution of code generated at runtime.
>
> * Changes to the LuaJIT interpreter to run as a 32 bit program on
>  PPC64 (PPC32on64). Registers are 64 bit wide, even though
>  pointers are still 32 bit. This affects e.g. the carry bit and
>  pointer addressing. The assembler code needs to be adapted.
>
> * Some common PPC instructions are micro-coded on the console CPUs,
>  which causes unwanted slow-downs. These instructions need to be
>  replaced with other instruction sequences.
>
> * Support for modified calling conventions.
>
> These changes allow embedding the LuaJIT 2.0 interpreter in PS3
> or XBox 360 projects, with a substantial speedup compared to the
> standard Lua 5.1 interpreter.
>
> The console ports will be integrated some time after the build
> process reorganizations are complete.
>
> Minor new features
> ------------------
>
> The following minor features are on my TODO list for LuaJIT 2.0:
>
> - Add 'goto' statement and labels, compatible with Lua 5.2.
>
>  This feature will also be available from the Lua 5.1 mode of
>  LuaJIT 2.0, where 'goto' is not a keyword. The parser figures
>  out whether it's a variable name or a statement.
>
> - Support '%a' and '%A' for string.format and parse hexadecimal
>  floating-point numbers (0x1.2a7p9 => 596.875) independent of the
>  C99-conformance of the C library (works even with MSVCRT).
>
> - Other Lua 5.2-compatibility features:
>
>  Return result status for os.execute() and pipe close.
>  Support extra format specifiers for io.lines() and fp:lines().

There are more minor deltas with Lua 5.2 :
- CLI option -E
- loadfile with mode
- rawlen
- package.searchers
- string.rep with separator
- table.pack
- math.log with base
- zero embedded in regex

In the past, I wrote patches (against beta5/6) for some of these features.
I could update them, if you want.

François

>
> Feature freeze
> --------------
>
> After the above features have been implemented, beta11 will be
> released and a feature freeze will be announced: no new features
> will be accepted into the LuaJIT 2.0 code base.
>
> Bug fixes to existing features will always be accepted, of course.
>
> I'm willing to make small concessions for the FFI library, as it's
> relatively young. Minor upwards-compatible features, that are
> important for usability, might make it into the code base, even
> after the feature freeze (e.g. backports from LuaJIT 2.1).
>
> Release plans
> -------------
>
> After the feature freeze and a concerted cleanup effort, several
> release candidates and the final 2.0.0 release will be put out.
>
> My goal is to complete all of this before the end of 2012.
>
> Bug fixes will be accumulated in the git repository, as usual. New
> dot releases (2.0.x), which include all of these fixes, will be
> made available at irregular intervals.
>
> I'm planning to give LuaJIT 2.0 LONG-TERM SUPPORT, provided
> there's sufficient interest in the community and continued
> sponsorship. The LuaJIT 2.0 release will likely be maintained and
> supported for several years. It will be updated to fix future
> incompatibilities, e.g. with new toolchain or OS releases.
>
>
> LuaJIT 2.1
> ==========
>
> After LuaJIT 2.0 has become stable, work on LuaJIT 2.1 may begin.
> This section is intended to give you a short overview of my plans
> for LuaJIT 2.1.
>
> Compatibility
> -------------
>
> A new release is always a good point to do some cleanup. LuaJIT
> has accumulated quite a bit of slack during the 2.0 development
> phase. And some of that has to go, e.g. the x87-compatibility in
> the interpreter for x86 CPUs without SSE2. Other features planned
> for removal will be announced in a separate message, before work
> on LuaJIT 2.1 starts.
>
> But there's one important message: compatibility with Lua 5.1 is
> there to stay!
>
> Many users of LuaJIT, especially those with big code bases, have a
> heavy investment in Lua 5.1-compatible infrastructure, tools,
> frameworks and in-house knowledge. Understandably, they don't want
> to throw away their investment, but still keep up with the newest
> developments.
>
> As I've previously said, Lua 5.2 provides few tangible benefits.
> LuaJIT already includes the major new features, without breaking
> compatibility. Upgrading to be compatible with 5.2, just for the
> sake of a higher version number, is neither a priority nor a
> sensible move for most LuaJIT users.
>
> To protect the investment of my users and still provide them with
> new features, LuaJIT 2.1 will stay compatible with Lua 5.1.
>
> New garbage collector
> ---------------------
>
> The garbage collector used by LuaJIT 2.0 is essentially the same
> as the Lua 5.1 GC. The current garbage collector is relatively
> slow compared to implementations for other language runtimes. It's
> not competitive with top-of-the-line GCs, especially for large
> workloads.
>
> The main innovation in LuaJIT 2.1 is a complete redesign of the
> garbage collector from scratch: the new garbage collector will be
> an arena-based, quad-color incremental, generational, non-copying,
> high-speed, cache-optimized garbage collector.
>
> You can read more about the design of the new GC here:
>
>  http://wiki.luajit.org/New-Garbage-Collector
>
> Note: this page is a work-in-progress! More details will be added
> and the gaps will be filled in over time.
>
> Planned features
> ----------------
>
> Based on recognized needs and suggestions from LuaJIT users, here
> are some other features, that I'd like to work on. Hopefully, many
> of them will make it into LuaJIT 2.1 or future versions.
>
> The list is in no particular order:
>
> - Metatable/__index specialization
>
>  Accesses to metatables and __index tables with constant keys are
>  already specialized by the JIT compiler to use optimized hash
>  lookups (HREFK). This is based on the assumption that individual
>  objects don't change their metatable (once assigned) and that
>  neither the metatable nor the __index table are modified. This
>  turns out to be true in practice, but those assumptions still
>  need to be checked at runtime, which can become costly for
>  OO-heavy programming.
>
>  Further specialization can be obtained by strictly relying on
>  these assumptions and omitting the related checks in the
>  generated code. In case any of the assumptions are broken (e.g.
>  a metatable is written to), the previously generated code must
>  be invalidated or flushed.
>
>  Different mechanisms for detecting broken assumptions and for
>  invalidating the generated code should be evaluated.
>
>  This optimization works at the lowest implementation level for
>  metatables in the VM. It should equally benefit any code that
>  uses metatables, not just the typical frameworks that implement
>  a class-based system on top of it.
>
> - Value-range propagation (VRP)
>
>  Value-range propagation is an optimization for the JIT compiler:
>  by propagating the possible ranges for a value, subsequent code
>  may be optimized or conditionals may be eliminated. Constant
>  propagation (already implemented) can be seen as a special case
>  of this optimization.
>
>  E.g. if a number is known to be in the range 0 <= x < 256 (say
>  it originates from string.byte), then a later mask operation
>  bit.band(x, 255) is redundant. Similarly, a subsequent test for
>  x < 0 can be eliminated.
>
>  Note that even though few programmers would explicitly write
>  such a series of operations, this can easily happen after
>  inlining of functions combined with constant propagation.
>
> - Hyperblock scheduling
>
>  Producing good code for unbiased branches is a key problem for
>  trace compilers. This is the main cause for "trace explosion"
>  and bad performance with certain types of branchy code.
>
>  Hyperblock scheduling promises to solve this nicely at the price
>  of a major redesign of the compiler: selected traces are woven
>  together to a single hyper-trace. This would also pave the way
>  for emitting predicated instructions, which benefits some CPUs
>  (e.g. ARM) and is a prerequisite for efficient vectorization.
>
> - FFI C pre-processor
>
>  The integrated C parser of the FFI library currently doesn't
>  support #define or other C pre-processor features. To support
>  the full range of C semantics, an integrated C pre-processor is
>  needed.
>
>  This would provide a nice solution to the C re-declaration
>  problem for FFI modules, too.
>
> - Partial C++ support for the FFI
>
>  Full C++ support for the FFI is not feasible, due to the sheer
>  complexity of the task: one would need to write more or less a
>  complete C++ compiler.
>
>  However, a limited number of C++ features can certainly be
>  supported. Of course, one could argue, anything but full support
>  doesn't make sense. But you'll never know, unless you try ...
>
>  It would be an interesting task to evaluate what subset of C++
>  can be supported with reasonable effort or which C++ libraries
>  can be successfully bound via the FFI. Basically: how far can
>  C++ support go, how much effort would be needed and does it
>  really pay off in practice?
>
>  Such a project should be split into the evaluation phase and an
>  implementation phase, which implements the C++ subset, based on
>  the prior evaluation.
>
> - User-definable intrinsics for the FFI
>
>  This is a low-level equivalent to GCC inline assembler: given a
>  C function declaration and a machine code template, an intrinsic
>  function (builtin) can be constructed and later called. This
>  allows generating and executing arbitrary instructions supported
>  by the target CPU. The JIT compiler inlines the intrinsic into
>  the generated machine code for maximum performance.
>
>  Developers usually shouldn't need to write machine code templates
>  themselves. Common libraries of intrinsics for different purposes
>  should be provided or contributed by experts.
>
> - Vector/SIMD data type support for the FFI
>
>  Currently, vector data types may be defined with the FFI, but
>  you really can't do much with them. The goal of this project is
>  to add full support for vector data types to the JIT compiler
>  and the CPU-specific backends (if the target CPU has a vector
>  extension).
>
>  A new "ffi.vec" module declares standard vector types and
>  attaches the machine-specific SIMD intrinsics as (meta)methods.
>
>  Prerequisites for this project are allocation sinking, the
>  user-definable intrinsics and the new garbage collector.
>
>  More about the last two features can be read here:
>    http://lua-users.org/lists/lua-l/2012-02/msg00207.html
>
> Most of these features are still in an early planning stage. I'm
> sure the community will come up with many more interesting ideas.
> Which of these will become a reality depends on the interest in
> the community and on sponsorships (see below).
>
>
> Call for Sponsors
> =================
>
> First, I'd like to say a BIG THANK YOU to all LuaJIT sponsors!
>
> Almost all of the recent work on LuaJIT 2.0 has been sponsored by
> various corporate sponsors. The full track record is here:
>
>  http://luajit.org/sponsors.html
>
> All of those architectural ports and new features wouldn't have
> been possible without your sponsorships!
>
> I think this sends a happy message to the greater open source
> community: the open source development model *does* work out and
> it can be a sustainable (side) business for its creators!
>
> Nonetheless, I have to look forward: as you've seen above, I've
> got big plans with LuaJIT 2.1. In fact, the plans are so big that
> I fear it may be hard to get enough sponsorships to cover just the
> work on the one major features, the new garbage collector.
>
> For LuaJIT 2.0, the ports to the various architectures made most
> of the money. The companies sponsoring them had a genuine, often
> urgent, business need for these ports. Sadly, this source is
> drying up, as the major architectures are well covered.
>
> The new garbage collector is certainly a desirable feature and
> IMHO the correct next evolutionary step for LuaJIT. Alas,
> developers have learned to work around the deficiencies of the
> current GC (by carefully avoiding allocations). The benefits of a
> new garbage collector are hard to quantify, without actually
> implementing it. And that's *a lot of work*, which makes it not
> exactly cheap. Maybe too expensive for a single company. It'll be
> a tough sell in any case.
>
> So far, I've relied exclusively on corporate sponsorships for
> various legal and administrative reasons. Ok, so the recent trend
> towards crowd funding got me thinking ...
>
> But let's be realistic: the Lua community is small, the LuaJIT
> community is even smaller -- it's growing fast, though. I simply
> don't know whether it's possible to gather enough people and
> enough money to finance the continued development of LuaJIT.
>
> And there's another issue: to me, it looks like the whole crowd
> funding idea is rapidly deteriorating into an arms race of
> marketing experts. So many people are jumping on that bandwagon
> now ... you'll never make it, unless you permanently stay on the
> front pages somehow.
>
> Alas, I'm not good at marketing and a garbage collector is a very
> technical and *very* unsexy project (for most people, anyway).
> But then, I'd really love to be proven wrong ...
>
> To be fair, I have to make this statement:
>
> I'd really like to work on LuaJIT and I'd like to continue shaping
> it's future. However, I fear, without sponsorships I'd have to do
> more work as a consultant (in unrelated jobs). That doesn't leave
> me enough spare time to do a significant amount of work on LuaJIT.
>
> Therefore, I cannot start working on LuaJIT 2.1, before I've got
> full covenants for a) maintaining two major code bases, b) the
> ground work to clean up the code base and prepare it for c) the
> work on the new garbage collector for LuaJIT 2.1.
>
> I estimate this to be worth on the order of EUR 80K+ ($100K+),
> only for the near future after the release of LuaJIT 2.0.
>
> We're not in a hurry, though. I'd like to publicly discuss all
> options thoroughly with the LuaJIT community and beyond. I'll open
> a new topic on the LuaJIT mailing list right after this posting.
>
> If you require anonymity, please write to me by mail, see:
>  http://luajit.org/sponsors.html
>
> Thank you!
>
>
> [Important note: please do NOT send money, checks or anything like
> that to me at this time! If there's a crowd funding effort or a
> corporate funding pool, this will be announced separately.]
>
> --Mike
>

Other related posts: