lua_yield is painfully slower in LuaJIT 2.0 than Lua 5.1.4

  • From: agentzh <agentzh@xxxxxxxxx>
  • To: luajit@xxxxxxxxxxxxx
  • Date: Mon, 8 Oct 2012 17:50:43 -0700

Hello, folks!

I'm one of the authors of the Lua Nginx module [1] and we've been
using Lua coroutines extensively in the Nginx core to provide
transparent nonblocking I/O for the Lua programmers (to avoid the
callback nightmare as seen in the NodeJS world, for example) and for
our "light-weight threads" [2] too.

But benchmark has shown that lua_yield in LuaJIT 2.0.0 beta10 (with
the default build options) is quite expensive at least on Linux
x86_64, which can be shown in the following "Flame Graph" [3]
generated by sampling the user-space backtraces of the Nginx/LuaJIT
process (under load) via SystemTap:

    http://agentzh.org/misc/nginx/thread-hello-lj2.svg

The rectangles in the graph represents frames in the user-stack
samples. The wider a rectangle is, the more often that frame appears
in the user-stack samples (it also means that that function call frame
is taking more time). You can put your mouse pointer over the
rectangle in the graph to see more details shown at the bottom. This
is *not* a heat map and the colour is irrelevant.

From this graph, we can see that lua_yield appears in approximately
13% of the total user-land stack samples (that is, about 13% of the
total run time), which is astonishing. The most time-consuming
sub-call within lua_yield is _lj_err_throw (and _Unwind_RaiseException
is the hottest within _lj_err_throw).

I got exactly the same result when using LuaJIT 2.0's git master HEAD
(commit da682b0e). Painfully slow.

When using the standard Lua 5.1.4 interpreter, however, lua_yield is
very cheap for exactly the same test case, as seen on the following
Flame Graph:

    http://agentzh.org/misc/nginx/thread-hello-lua51.svg

We can see that lua_yield appeared in only 1% of the total samples
(i.e., just 1% of the total run time) when using Lua 5.1.4.

And ab (ApacheBench) also shows that now it is indeed much faster (60k
q/s) than we were using LuaJIT 2.0 before (50k q/s).

And I'm wondering if there's any room for optimizing lua_yield in
LuaJIT 2.0? If we can make lua_yield fast here, then web apps atop
ngx_lua can also run significantly faster :)

Thank you in advance!

Best regards,
-agentzh

[1] http://wiki.nginx.org/HttpLuaModule
[2] 
http://groups.google.com/group/openresty-en/browse_thread/thread/c14e27a459964056
[3] http://dtrace.org/blogs/brendan/2011/12/16/flame-graphs/

Other related posts: