Re: lua_yield is painfully slower in LuaJIT 2.0 than Lua 5.1.4

  • From: Coda Highland <chighland@xxxxxxxxx>
  • To: luajit@xxxxxxxxxxxxx
  • Date: Mon, 8 Oct 2012 19:42:57 -0700

On Mon, Oct 8, 2012 at 5:50 PM, agentzh <agentzh@xxxxxxxxx> wrote:
> Hello, folks!
> I'm one of the authors of the Lua Nginx module [1] and we've been
> using Lua coroutines extensively in the Nginx core to provide
> transparent nonblocking I/O for the Lua programmers (to avoid the
> callback nightmare as seen in the NodeJS world, for example) and for
> our "light-weight threads" [2] too.
> But benchmark has shown that lua_yield in LuaJIT 2.0.0 beta10 (with
> the default build options) is quite expensive at least on Linux
> x86_64, which can be shown in the following "Flame Graph" [3]
> generated by sampling the user-space backtraces of the Nginx/LuaJIT
> process (under load) via SystemTap:
> The rectangles in the graph represents frames in the user-stack
> samples. The wider a rectangle is, the more often that frame appears
> in the user-stack samples (it also means that that function call frame
> is taking more time). You can put your mouse pointer over the
> rectangle in the graph to see more details shown at the bottom. This
> is *not* a heat map and the colour is irrelevant.
> From this graph, we can see that lua_yield appears in approximately
> 13% of the total user-land stack samples (that is, about 13% of the
> total run time), which is astonishing. The most time-consuming
> sub-call within lua_yield is _lj_err_throw (and _Unwind_RaiseException
> is the hottest within _lj_err_throw).
> I got exactly the same result when using LuaJIT 2.0's git master HEAD
> (commit da682b0e). Painfully slow.
> When using the standard Lua 5.1.4 interpreter, however, lua_yield is
> very cheap for exactly the same test case, as seen on the following
> Flame Graph:
> We can see that lua_yield appeared in only 1% of the total samples
> (i.e., just 1% of the total run time) when using Lua 5.1.4.
> And ab (ApacheBench) also shows that now it is indeed much faster (60k
> q/s) than we were using LuaJIT 2.0 before (50k q/s).
> And I'm wondering if there's any room for optimizing lua_yield in
> LuaJIT 2.0? If we can make lua_yield fast here, then web apps atop
> ngx_lua can also run significantly faster :)
> Thank you in advance!
> Best regards,
> -agentzh
> [1]
> [2] 
> [3]

A question:

How does it add up in terms of real time?

It may be that lua_yield is proportionally more expensive in LuaJIT,
but I have my doubts that it's ACTUALLY more expensive. Is LuaJIT
still capable of doing more in the same amount of real time?

It's very common in LuaJIT for the biggest overhead in an algorithm
being the switch between Lua and C. lua_yield is one of those cases
where the entire Lua state has to be nailed down. Another similar
thing that's bad for LuaJIT are invoking Lua callbacks from C code. In
both cases the JIT is unable to compile across the boundary, so you
lose the optimizations that could have been applied.

In the case of callbacks, it's frequently more efficient to write the
entire algorithm in Lua. When you do so, LuaJIT is capable of
including the callback in the trace, usually optimizing away the
function call entirely and compiling the routine to machine code.

Now, I don't know if the same is true for lua_yield -- I don't know if
pure-Lua coroutines get the same bonus as pure-Lua sequential

/s/ Adam

Other related posts: