Re: How does LuaJIT's trace compiler work?

From: Robin Heggelund Hansen <skinneyz89@xxxxxxxxx>
To: luajit@xxxxxxxxxxxxx
Date: Fri, 29 Nov 2013 16:24:12 +0100
Awesome answer, thanks Mike!
Can I post this answer on stackoverflow as well? Or perhaps you can post
it, so I can credit your answer?


2013/11/29 Mike Pall <mike-1311@xxxxxxxxxx>

> Robin Heggelund Hansen wrote:
> > From what I understand, LuaJIT's JIT doesn't compile hot methods like
> > Java's HotSpot does, it compiles hot paths originating from loops. Does
> > this mean that if something doesn't originate from a loop (say, I call
> Lua
> > functions from the C-api) then the code will never be jitted?
>
> Function entries can be trace start points, too. But the hotspot
> detection and the region selection prefers loops.
>
> > And what happens when you hit another loop? Will the path to the
> > second loop be JIT'ed, and then a new path from that loop jitted
> > as well, or will the second loop be a part of the same path?
>
> Depends. If the inner loop has a low iteration count, it'll be
> unrolled and inlined. For most loop nestings there'll be a single
> trace for the outer loop that goes 'around' the inner loop and
> joins its exit with its entry. Otherwise traces link to each other
> in linear pieces.
>
> > How does the interpreter choose the most optimal hot path?
>
> The interpreter doesn't choose a path. Region selection uses
> probabilistic algorithms and lots of intertwined heuristics to
> select when and where to start a trace, whether to follow it and
> where to end it. It's really, really hard to describe this
> adequately in prose and not in code.
>
> IMHO LuaJIT has pretty good facilities for looking at the
> generated traces (-jv and -jdump). Try this with some carefully
> crafted examples and check the source code for details. There are
> a few surprises on your way.
>
> > Let's say I have a hash-table of ints -> strings. Now imagine
> > that I've called table[x] with x being 3 and 5 enough times that
> > they've become hot paths and jitted, how does the interpreter
> > decide which jitted code to call for table[x] where x is 4?
>
> Umm, no. Indexing with non-constant integers is not specialized
> directly (the pay-off is too low). There's specialization for tons
> of other things, e.g. function calls. This may indirectly lead to
> specialization of other instructions, of course.
>
> Specialization works by adding checked assertions to the IR. A new
> side trace is spawned if an assertion turns out to be wrong often
> enough. That one either specializes to other pre-conditions or
> generalizes, depending on the circumstances.
>
> This leads to a graph of traces that probabilistically approach a
> local optimum. Similar to how a C switch statement is turned into
> a decision tree. Except that a JIT compiler has access to runtime
> path statistics (indirectly, via probabilistic hotspot detection)
> and a trace compiler can ignore paths that are never taken. This
> often makes a decision tree a better choice than a jump table
> (which is mainly beneficial for a large equi-weighted fan-out or
> in absence of known weights).
>
> [
> A recent example illustrates the power of this approach:
>
> Cloudflare's WAF (web application firewall) basically generates
> Lua code for the (highly non-linear) maze of firewall rules. An
> incoming attack triggers certain rules and the corresponding paths
> are turned into linearized traces. These can be heavily optimized
> by LuaJIT, much more so than you could ever hope to do with a
> static compiler.
>
> When a different attack wave is started, it'll spawn different
> traces -- i.e. the generated code dynamically adapts to the
> specific attacks.
>
> The result is a much higher firewall throughput than you might be
> able to achieve with a data-driven or a static compilation
> approach. This directly equates into money saved on machines
> needed to handle the load.
> ]
>
> > Another thing that has been racking my brain. Since paths are compiled,
> not
> > functions, won't a trace compiler require more memory?
>
> Nope, a trace compiler generates much, much less code than a
> method-at-a-time compiler. Only a fraction of the entire program
> is compiled and traces are usually rather short, too.
>
> > Since you can't really re-use compiled code of another path I
> > mean, and since paths will probably be larger than single
> > functions in the general case...
>
> There's some amount of trace overlap and code duplication. But
> that's often beneficial due to better optimization opportunities.
> The effect on code density is rather minor in practice.
>
> --Mike
>
>
Follow-Ups:
- Re: How does LuaJIT's trace compiler work?
  - From: Mike Pall
References:
- How does LuaJIT's trace compiler work?
  - From: Robin Heggelund Hansen
- Re: How does LuaJIT's trace compiler work?
  - From: Mike Pall
Re: How does LuaJIT's trace compiler work?

Other related posts: