2012/6/7 Mike Pall <mike-1206@xxxxxxxxxx>: > LuaJIT Roadmap 2012/2013 > ************************ > > This is the LuaJIT roadmap for 2012/2013, bringing you up to date > on the current and future developments around LuaJIT. > > I'm happy to answer your questions here on the LuaJIT mailing list, > on related news aggregators or by mail. > > * Status of LuaJIT 2.0, new features and release planning > * Plans for LuaJIT 2.1, new garbage collector and other features > * Call for Sponsors > > > LuaJIT 2.0 > ========== > > Current status > -------------- > > Overall, the LuaJIT 2.0 code base is in good shape to become > stable, soon. The beta releases are already used in production by > many developers and in many different projects. > > LuaJIT 2.0 has grown quite a few more architectural ports than > expected in the last roadmap from 2011. But this is a good thing: > developers get to use a stable VM for their target architectures > *right now*. And it gives me more leeway to introduce some major > changes to the next version. > > LuaJIT 2.0 already runs on all major operating systems. Soon, > it'll support close to a dozen architectures or architectural > variations. This pretty much covers the complete desktop and > server markets, almost all of the smartphone market and a sizeable > chunk of the 32 bit embedded CPU market, too. Coverage will become > even better over time, due to expected market shake-outs. > > LuaJIT is widely considered to be one of the fastest dynamic > language implementations. It features a compact, innovative > top-of-the-line just-in-time (JIT) compiler. > > The integrated LuaJIT FFI library is a major additional benefit: > it largely obviates the need to write tedious manual bindings with > the classic Lua/C API. There's no need to learn a separate binding > language -- it parses plain C declarations! The JIT compiler is > able to generate code on par with a C compiler for access to > native C data structures. Calls to C functions can be inlined in > JIT-compiled code. > > LuaJIT 2.0 has extensive architecture-specific and OS-specific > customizations. This, together with excellent cross-compilation > support, makes LuaJIT an ideal tool for developers who need to > embed a nearly universally portable, light-weight *and* high-speed > dynamic VM into their projects. > > [Phew! Enough of the marketing speak for now ... ;-) ] > > What's next > ----------- > > Now that LuaJIT 2.0.0-beta10 is out, a couple of reorganizations > will happen in the source tree. After that, one new optimization > and two new ports will be added. > > These are (probably) the last major changes to LuaJIT 2.0 before > the final (non-beta) release. All other planned features will have > to wait for LuaJIT 2.1. > > Addition of a minified Lua interpreter > -------------------------------------- > > A customized, heavily stripped and minimized Lua interpreter will > be included to assist the build process. This weighs in at only > 173 KB, or 45 KB compressed. It'll be compiled first during the > build process (as a host executable when cross-compiling). > > The first use case is to run DynASM. This allows generating the > machine-specific files for the current target architecture at > build time. Which in turn allows the removal of various > pre-translated files. > > The addition of a minimal Lua interpreter opens up more options > for customizing and simplifying the build process in the future. > E.g. most of the C code, that's only used at build time, can be > replaced with Lua code. > > The program to generate the (mostly illegible) minified C source > code for the Lua interpreter will be included. Security-conscious > people can check that it generates identical output, given the > original Lua sources. Or they may use the standard Lua 5.1/5.2 > interpreter for the build process (build option). > > Removal of pre-generated buildvm_${arch}.h files > ------------------------------------------------ > > The pre-generated, architecture-specific files buildvm_${arch}.h > contain the LuaJIT interpreter-generator for each architecture, > ready for consumption by a C compiler to generate the 'buildvm' > executable. The actual sources are in the buildvm_${arch}.dasc > files. > > The assembler source code of the interpreter needs to be translated > with DynASM, which is a Lua program. To avoid a chicken-and-egg > situation, those files had to be shipped pre-generated. > > Due to the proliferation of architectures and architectural > variations, the pre-generated files have already grown to 844 KB. > Compressed, this adds only 133 KB to the released tar.gz files, > but that's still too much. And more is to come. > > Also, even a single-line change in one of the *.dasc files > triggers lots of changes in the corresponding *.h file. This > causes needlessly big commits in the git repository. > > The addition of a minified Lua interpreter solves this problem: > the pre-generated buildvm_${arch}.h files can be removed. Only > the output file for the selected target architecture will be > translated with DynASM at build time, utilizing the minified Lua > interpreter. Many more architectural variations can now be added > with no concern over the size of the intermediate *.h files. > > In case you're following the git repository: it's recommended that > you do a 'make cleaner' [sic!] to clean up your build tree, right > after the big commits for this change arrive. It should still work > without that step, though. > > Move lib/* to src/jit/* > ----------------------- > > The JIT-compiler-specific Lua modules currently shipped in lib/* > need to be installed in the package path, relative to a 'jit' > directory, before they can be used. > > To allow testing of the un-installed command line executable from > within the 'src' directory, the modules will be moved to src/jit/*. > Other hierarchies (e.g. src/ffi/*) may be added in the future. > > The 'install' target of the top-level Makefile will of course be > adjusted accordingly. Watch out if you've modified this file or > if you've automated the install process with other tools. > > New optimization: Allocation sinking and store sinking > ------------------------------------------------------ > > A corporate sponsor, who wishes to remain anonymous, has sponsored > the development of allocation sinking and store sinking > optimizations for LuaJIT. > > Avoiding temporary allocations is an important optimization for > high-level languages. LuaJIT already eliminates many of these with > multiple techniques: e.g. floating-point numbers aren't boxed and > the JIT compiler eliminates allocations for most immutable > objects. Alas, traditional techniques to avoid the remaining > allocations (escape analysis and scalar replacement of aggregates) > are ineffective for dynamic languages. > > The goal of this sponsorship is to research the combination of > store-to-load-forwarding (already implemented) with store sinking > and allocation sinking (to be implemented). This innovative > approach is highly effective in avoiding temporary allocations in > the fast paths, even under the presence of many slow paths where > the temporary object may escape to. This approach is most > effective for dynamic languages, but may be successfully applied > elsewhere, when the classic techniques fail. > > Work for this feature is currently in progress. > > New port: ARM VFP support and hard-float EABI support > ----------------------------------------------------- > > A corporate sponsor, who wishes to remain anonymous, has sponsored > the VFP support (hardware FPU) and the hard-float EABI support for > the ARM port. After that work is complete, the ARM port of LuaJIT > can be built for three different CPU/ABI combinations: > > * ARMv5+, soft-float EABI, soft-float FP operations (already exists) > * ARMv6+, soft-float EABI, VFPv2+ FP operations > * ARMv6+, hard-float EABI, VFPv2+ FP operations (e.g. Debian armhf) > > Work on the VFP support and hard-float support for the ARM port is > scheduled for Q3 2012. > > New port: PPC32on64 interpreter for PS3 and XBox 360 > ---------------------------------------------------- > > Current-generation consoles based on PowerPC CPUs cannot run the > existing PPC port of LuaJIT. Several changes are needed: > > * The JIT compiler must be disabled for the consoles, as the > hypervisors do not allow execution of code generated at runtime. > > * Changes to the LuaJIT interpreter to run as a 32 bit program on > PPC64 (PPC32on64). Registers are 64 bit wide, even though > pointers are still 32 bit. This affects e.g. the carry bit and > pointer addressing. The assembler code needs to be adapted. > > * Some common PPC instructions are micro-coded on the console CPUs, > which causes unwanted slow-downs. These instructions need to be > replaced with other instruction sequences. > > * Support for modified calling conventions. > > These changes allow embedding the LuaJIT 2.0 interpreter in PS3 > or XBox 360 projects, with a substantial speedup compared to the > standard Lua 5.1 interpreter. > > The console ports will be integrated some time after the build > process reorganizations are complete. > > Minor new features > ------------------ > > The following minor features are on my TODO list for LuaJIT 2.0: > > - Add 'goto' statement and labels, compatible with Lua 5.2. > > This feature will also be available from the Lua 5.1 mode of > LuaJIT 2.0, where 'goto' is not a keyword. The parser figures > out whether it's a variable name or a statement. > > - Support '%a' and '%A' for string.format and parse hexadecimal > floating-point numbers (0x1.2a7p9 => 596.875) independent of the > C99-conformance of the C library (works even with MSVCRT). > > - Other Lua 5.2-compatibility features: > > Return result status for os.execute() and pipe close. > Support extra format specifiers for io.lines() and fp:lines(). There are more minor deltas with Lua 5.2 : - CLI option -E - loadfile with mode - rawlen - package.searchers - string.rep with separator - table.pack - math.log with base - zero embedded in regex In the past, I wrote patches (against beta5/6) for some of these features. I could update them, if you want. François > > Feature freeze > -------------- > > After the above features have been implemented, beta11 will be > released and a feature freeze will be announced: no new features > will be accepted into the LuaJIT 2.0 code base. > > Bug fixes to existing features will always be accepted, of course. > > I'm willing to make small concessions for the FFI library, as it's > relatively young. Minor upwards-compatible features, that are > important for usability, might make it into the code base, even > after the feature freeze (e.g. backports from LuaJIT 2.1). > > Release plans > ------------- > > After the feature freeze and a concerted cleanup effort, several > release candidates and the final 2.0.0 release will be put out. > > My goal is to complete all of this before the end of 2012. > > Bug fixes will be accumulated in the git repository, as usual. New > dot releases (2.0.x), which include all of these fixes, will be > made available at irregular intervals. > > I'm planning to give LuaJIT 2.0 LONG-TERM SUPPORT, provided > there's sufficient interest in the community and continued > sponsorship. The LuaJIT 2.0 release will likely be maintained and > supported for several years. It will be updated to fix future > incompatibilities, e.g. with new toolchain or OS releases. > > > LuaJIT 2.1 > ========== > > After LuaJIT 2.0 has become stable, work on LuaJIT 2.1 may begin. > This section is intended to give you a short overview of my plans > for LuaJIT 2.1. > > Compatibility > ------------- > > A new release is always a good point to do some cleanup. LuaJIT > has accumulated quite a bit of slack during the 2.0 development > phase. And some of that has to go, e.g. the x87-compatibility in > the interpreter for x86 CPUs without SSE2. Other features planned > for removal will be announced in a separate message, before work > on LuaJIT 2.1 starts. > > But there's one important message: compatibility with Lua 5.1 is > there to stay! > > Many users of LuaJIT, especially those with big code bases, have a > heavy investment in Lua 5.1-compatible infrastructure, tools, > frameworks and in-house knowledge. Understandably, they don't want > to throw away their investment, but still keep up with the newest > developments. > > As I've previously said, Lua 5.2 provides few tangible benefits. > LuaJIT already includes the major new features, without breaking > compatibility. Upgrading to be compatible with 5.2, just for the > sake of a higher version number, is neither a priority nor a > sensible move for most LuaJIT users. > > To protect the investment of my users and still provide them with > new features, LuaJIT 2.1 will stay compatible with Lua 5.1. > > New garbage collector > --------------------- > > The garbage collector used by LuaJIT 2.0 is essentially the same > as the Lua 5.1 GC. The current garbage collector is relatively > slow compared to implementations for other language runtimes. It's > not competitive with top-of-the-line GCs, especially for large > workloads. > > The main innovation in LuaJIT 2.1 is a complete redesign of the > garbage collector from scratch: the new garbage collector will be > an arena-based, quad-color incremental, generational, non-copying, > high-speed, cache-optimized garbage collector. > > You can read more about the design of the new GC here: > > http://wiki.luajit.org/New-Garbage-Collector > > Note: this page is a work-in-progress! More details will be added > and the gaps will be filled in over time. > > Planned features > ---------------- > > Based on recognized needs and suggestions from LuaJIT users, here > are some other features, that I'd like to work on. Hopefully, many > of them will make it into LuaJIT 2.1 or future versions. > > The list is in no particular order: > > - Metatable/__index specialization > > Accesses to metatables and __index tables with constant keys are > already specialized by the JIT compiler to use optimized hash > lookups (HREFK). This is based on the assumption that individual > objects don't change their metatable (once assigned) and that > neither the metatable nor the __index table are modified. This > turns out to be true in practice, but those assumptions still > need to be checked at runtime, which can become costly for > OO-heavy programming. > > Further specialization can be obtained by strictly relying on > these assumptions and omitting the related checks in the > generated code. In case any of the assumptions are broken (e.g. > a metatable is written to), the previously generated code must > be invalidated or flushed. > > Different mechanisms for detecting broken assumptions and for > invalidating the generated code should be evaluated. > > This optimization works at the lowest implementation level for > metatables in the VM. It should equally benefit any code that > uses metatables, not just the typical frameworks that implement > a class-based system on top of it. > > - Value-range propagation (VRP) > > Value-range propagation is an optimization for the JIT compiler: > by propagating the possible ranges for a value, subsequent code > may be optimized or conditionals may be eliminated. Constant > propagation (already implemented) can be seen as a special case > of this optimization. > > E.g. if a number is known to be in the range 0 <= x < 256 (say > it originates from string.byte), then a later mask operation > bit.band(x, 255) is redundant. Similarly, a subsequent test for > x < 0 can be eliminated. > > Note that even though few programmers would explicitly write > such a series of operations, this can easily happen after > inlining of functions combined with constant propagation. > > - Hyperblock scheduling > > Producing good code for unbiased branches is a key problem for > trace compilers. This is the main cause for "trace explosion" > and bad performance with certain types of branchy code. > > Hyperblock scheduling promises to solve this nicely at the price > of a major redesign of the compiler: selected traces are woven > together to a single hyper-trace. This would also pave the way > for emitting predicated instructions, which benefits some CPUs > (e.g. ARM) and is a prerequisite for efficient vectorization. > > - FFI C pre-processor > > The integrated C parser of the FFI library currently doesn't > support #define or other C pre-processor features. To support > the full range of C semantics, an integrated C pre-processor is > needed. > > This would provide a nice solution to the C re-declaration > problem for FFI modules, too. > > - Partial C++ support for the FFI > > Full C++ support for the FFI is not feasible, due to the sheer > complexity of the task: one would need to write more or less a > complete C++ compiler. > > However, a limited number of C++ features can certainly be > supported. Of course, one could argue, anything but full support > doesn't make sense. But you'll never know, unless you try ... > > It would be an interesting task to evaluate what subset of C++ > can be supported with reasonable effort or which C++ libraries > can be successfully bound via the FFI. Basically: how far can > C++ support go, how much effort would be needed and does it > really pay off in practice? > > Such a project should be split into the evaluation phase and an > implementation phase, which implements the C++ subset, based on > the prior evaluation. > > - User-definable intrinsics for the FFI > > This is a low-level equivalent to GCC inline assembler: given a > C function declaration and a machine code template, an intrinsic > function (builtin) can be constructed and later called. This > allows generating and executing arbitrary instructions supported > by the target CPU. The JIT compiler inlines the intrinsic into > the generated machine code for maximum performance. > > Developers usually shouldn't need to write machine code templates > themselves. Common libraries of intrinsics for different purposes > should be provided or contributed by experts. > > - Vector/SIMD data type support for the FFI > > Currently, vector data types may be defined with the FFI, but > you really can't do much with them. The goal of this project is > to add full support for vector data types to the JIT compiler > and the CPU-specific backends (if the target CPU has a vector > extension). > > A new "ffi.vec" module declares standard vector types and > attaches the machine-specific SIMD intrinsics as (meta)methods. > > Prerequisites for this project are allocation sinking, the > user-definable intrinsics and the new garbage collector. > > More about the last two features can be read here: > http://lua-users.org/lists/lua-l/2012-02/msg00207.html > > Most of these features are still in an early planning stage. I'm > sure the community will come up with many more interesting ideas. > Which of these will become a reality depends on the interest in > the community and on sponsorships (see below). > > > Call for Sponsors > ================= > > First, I'd like to say a BIG THANK YOU to all LuaJIT sponsors! > > Almost all of the recent work on LuaJIT 2.0 has been sponsored by > various corporate sponsors. The full track record is here: > > http://luajit.org/sponsors.html > > All of those architectural ports and new features wouldn't have > been possible without your sponsorships! > > I think this sends a happy message to the greater open source > community: the open source development model *does* work out and > it can be a sustainable (side) business for its creators! > > Nonetheless, I have to look forward: as you've seen above, I've > got big plans with LuaJIT 2.1. In fact, the plans are so big that > I fear it may be hard to get enough sponsorships to cover just the > work on the one major features, the new garbage collector. > > For LuaJIT 2.0, the ports to the various architectures made most > of the money. The companies sponsoring them had a genuine, often > urgent, business need for these ports. Sadly, this source is > drying up, as the major architectures are well covered. > > The new garbage collector is certainly a desirable feature and > IMHO the correct next evolutionary step for LuaJIT. Alas, > developers have learned to work around the deficiencies of the > current GC (by carefully avoiding allocations). The benefits of a > new garbage collector are hard to quantify, without actually > implementing it. And that's *a lot of work*, which makes it not > exactly cheap. Maybe too expensive for a single company. It'll be > a tough sell in any case. > > So far, I've relied exclusively on corporate sponsorships for > various legal and administrative reasons. Ok, so the recent trend > towards crowd funding got me thinking ... > > But let's be realistic: the Lua community is small, the LuaJIT > community is even smaller -- it's growing fast, though. I simply > don't know whether it's possible to gather enough people and > enough money to finance the continued development of LuaJIT. > > And there's another issue: to me, it looks like the whole crowd > funding idea is rapidly deteriorating into an arms race of > marketing experts. So many people are jumping on that bandwagon > now ... you'll never make it, unless you permanently stay on the > front pages somehow. > > Alas, I'm not good at marketing and a garbage collector is a very > technical and *very* unsexy project (for most people, anyway). > But then, I'd really love to be proven wrong ... > > To be fair, I have to make this statement: > > I'd really like to work on LuaJIT and I'd like to continue shaping > it's future. However, I fear, without sponsorships I'd have to do > more work as a consultant (in unrelated jobs). That doesn't leave > me enough spare time to do a significant amount of work on LuaJIT. > > Therefore, I cannot start working on LuaJIT 2.1, before I've got > full covenants for a) maintaining two major code bases, b) the > ground work to clean up the code base and prepare it for c) the > work on the new garbage collector for LuaJIT 2.1. > > I estimate this to be worth on the order of EUR 80K+ ($100K+), > only for the near future after the release of LuaJIT 2.0. > > We're not in a hurry, though. I'd like to publicly discuss all > options thoroughly with the LuaJIT community and beyond. I'll open > a new topic on the LuaJIT mailing list right after this posting. > > If you require anonymity, please write to me by mail, see: > http://luajit.org/sponsors.html > > Thank you! > > > [Important note: please do NOT send money, checks or anything like > that to me at this time! If there's a crowd funding effort or a > corporate funding pool, this will be announced separately.] > > --Mike >