Awesome answer, thanks Mike! Can I post this answer on stackoverflow as well? Or perhaps you can post it, so I can credit your answer? 2013/11/29 Mike Pall <mike-1311@xxxxxxxxxx> > Robin Heggelund Hansen wrote: > > From what I understand, LuaJIT's JIT doesn't compile hot methods like > > Java's HotSpot does, it compiles hot paths originating from loops. Does > > this mean that if something doesn't originate from a loop (say, I call > Lua > > functions from the C-api) then the code will never be jitted? > > Function entries can be trace start points, too. But the hotspot > detection and the region selection prefers loops. > > > And what happens when you hit another loop? Will the path to the > > second loop be JIT'ed, and then a new path from that loop jitted > > as well, or will the second loop be a part of the same path? > > Depends. If the inner loop has a low iteration count, it'll be > unrolled and inlined. For most loop nestings there'll be a single > trace for the outer loop that goes 'around' the inner loop and > joins its exit with its entry. Otherwise traces link to each other > in linear pieces. > > > How does the interpreter choose the most optimal hot path? > > The interpreter doesn't choose a path. Region selection uses > probabilistic algorithms and lots of intertwined heuristics to > select when and where to start a trace, whether to follow it and > where to end it. It's really, really hard to describe this > adequately in prose and not in code. > > IMHO LuaJIT has pretty good facilities for looking at the > generated traces (-jv and -jdump). Try this with some carefully > crafted examples and check the source code for details. There are > a few surprises on your way. > > > Let's say I have a hash-table of ints -> strings. Now imagine > > that I've called table[x] with x being 3 and 5 enough times that > > they've become hot paths and jitted, how does the interpreter > > decide which jitted code to call for table[x] where x is 4? > > Umm, no. Indexing with non-constant integers is not specialized > directly (the pay-off is too low). There's specialization for tons > of other things, e.g. function calls. This may indirectly lead to > specialization of other instructions, of course. > > Specialization works by adding checked assertions to the IR. A new > side trace is spawned if an assertion turns out to be wrong often > enough. That one either specializes to other pre-conditions or > generalizes, depending on the circumstances. > > This leads to a graph of traces that probabilistically approach a > local optimum. Similar to how a C switch statement is turned into > a decision tree. Except that a JIT compiler has access to runtime > path statistics (indirectly, via probabilistic hotspot detection) > and a trace compiler can ignore paths that are never taken. This > often makes a decision tree a better choice than a jump table > (which is mainly beneficial for a large equi-weighted fan-out or > in absence of known weights). > > [ > A recent example illustrates the power of this approach: > > Cloudflare's WAF (web application firewall) basically generates > Lua code for the (highly non-linear) maze of firewall rules. An > incoming attack triggers certain rules and the corresponding paths > are turned into linearized traces. These can be heavily optimized > by LuaJIT, much more so than you could ever hope to do with a > static compiler. > > When a different attack wave is started, it'll spawn different > traces -- i.e. the generated code dynamically adapts to the > specific attacks. > > The result is a much higher firewall throughput than you might be > able to achieve with a data-driven or a static compilation > approach. This directly equates into money saved on machines > needed to handle the load. > ] > > > Another thing that has been racking my brain. Since paths are compiled, > not > > functions, won't a trace compiler require more memory? > > Nope, a trace compiler generates much, much less code than a > method-at-a-time compiler. Only a fraction of the entire program > is compiled and traces are usually rather short, too. > > > Since you can't really re-use compiled code of another path I > > mean, and since paths will probably be larger than single > > functions in the general case... > > There's some amount of trace overlap and code duplication. But > that's often beneficial due to better optimization opportunities. > The effect on code density is rather minor in practice. > > --Mike > >