Re: segmentation fault in lj_vm_growstack_f

  • From: ZNV <mejedi@xxxxxxxxx>
  • To: luajit@xxxxxxxxxxxxx
  • Date: Fri, 10 Aug 2018 10:34:57 +0200

Hey!

So you are having a situation like this:

1) Lua calls a C function (via Lua C api or FFI, doesn't matter for now).
2) That C function performs task switching (switches to a different C
stack/execution state in the same thread).
3) Now another C function is executing.
4) It invokes Lua, or returns to Lua.

LuaJIT is not expecting this kind of reentrancy. Don't do it. Just don't!

FYI:

There are some bits of global (per VM) state that is not properly updated.
For instance, there is cur_L pointer for tracking the currently active Lua
state. This could manifest as a Lua using the wrong Lua stack, leading to
memory corruption, manifesting as an exception in lj_vm_growstack.

C FFI functions are not permitted to invoke Lua by any means. Because, if a
function call is ever JIT-compiled, Lua VM is going to be in a funny
inconsistent state while the function is executing.

JIT compiler itself is likely to get terribly confused by the task
switching: when it decides to JIT-compile something, the VM enters a
special tracing mode, recording every byte code instruction prior to
execution. Task-switching creates unexpected reentrancy here.

If you are doing task-switching in a lua-C function, this will work, since
lua-C function calls are never JIT-compiled. Make sure that the following
patches are applied to your LuaJIT:

https://github.com/tarantool/luajit/commit/ed412cd9f55fe87fd32a69c86e1732690fc5c1b0
https://github.com/tarantool/luajit/commit/5ccd25d740476a37d414733b5192d5be0ef06173

Cheers, NZ

чт, 9 авг. 2018 г. в 11:51, tokers <zchao1995@xxxxxxxxx>:

Hello!

We occurred a segmentation fault in LuaJIT (we use LuaJIT inside
OpenResty). The backtrace is:

(gdb) bt
#0  0x00007f0c1a6aade2 in lj_vm_growstack_f () from 
/usr/local/marco/luajit/lib/libluajit-5.1.so.2
#1  0x0000000000550c13 in ngx_http_lua_run_thread (L=0x41a00378, r=0x1767f80, 
ctx=0x13793f0, nrets=0)
    at 
/disk/ssd2/alex_workflow/marco/deps/lua-nginx-module-0.10.11h/src/ngx_http_lua_util.c:1013
#2  0x000000000057825c in ngx_http_lua_ssl_cert_by_chunk (L=0x41a00378, 
r=0x1767f80)
    at 
/disk/ssd2/alex_workflow/marco/deps/lua-nginx-module-0.10.11h/src/ngx_http_lua_ssl_certby.c:527
#3  0x0000000000577457 in ngx_http_lua_ssl_cert_handler_file (r=0x1767f80, 
lscf=0x12c2138, L=0x41a00378)
    at 
/disk/ssd2/alex_workflow/marco/deps/lua-nginx-module-0.10.11h/src/ngx_http_lua_ssl_certby.c:57
#4  0x0000000000577c2c in ngx_http_lua_ssl_cert_handler (ssl_conn=0x1765790, 
data=0x0)
    at 
/disk/ssd2/alex_workflow/marco/deps/lua-nginx-module-0.10.11h/src/ngx_http_lua_ssl_certby.c:315
#5  0x00007f0c1a1e544a in tls_post_process_client_hello (s=0x1765790, 
wst=WORK_MORE_B) at ssl/statem/statem_srvr.c:2179
#6  0x00007f0c1a1e2d2f in ossl_statem_server_post_process_message 
(s=0x1765790, wst=WORK_MORE_A) at ssl/statem/statem_srvr.c:1148
#7  0x00007f0c1a1cfe52 in read_state_machine (s=0x1765790) at 
ssl/statem/statem.c:660
#8  0x00007f0c1a1cf7a9 in state_machine (s=0x1765790, server=1) at 
ssl/statem/statem.c:428
#9  0x00007f0c1a1cf33b in ossl_statem_accept (s=0x1765790) at 
ssl/statem/statem.c:251
#10 0x00007f0c1a1b642d in ssl_do_handshake_intern (vargs=0x12c6730) at 
ssl/ssl_lib.c:3467
#11 0x00007f0c19d0770f in async_start_func () at crypto/async/async.c:154
#12 0x00007f0c199108f0 in __malloc_info (fp=0x7ffea8f29be0, 
options=<optimized out>) at malloc.c:5196
#13 0x0000000001f6b870 in ?? ()
#14 0x0000000000000000 in ?? ()

    0x7f0c1a6aadb3 <lj_vm_growstack_f>      lea    -0x8(%rdx,%rax,8),%eax     
                                                                              
                        │
   │0x7f0c1a6aadb7 <lj_vm_growstack_f+4>    movzbl -0x3d(%rbx),%ecx           
                                                                              
                        │
   │0x7f0c1a6aadbb <lj_vm_growstack_f+8>    add    $0x4,%ebx                  
                                                                              
                        │
   │0x7f0c1a6aadbe <lj_vm_growstack_f+11>   mov    %edx,0x10(%rbp)            
                                                                              
                        │
   │0x7f0c1a6aadc1 <lj_vm_growstack_f+14>   mov    %eax,0x18(%rbp)            
                                                                              
                        │
   │0x7f0c1a6aadc4 <lj_vm_growstack_f+17>   mov    %ebx,0x1c(%rsp)            
                                                                              
                        │
   │0x7f0c1a6aadc8 <lj_vm_growstack_f+21>   mov    %ecx,%esi                  
                                                                              
                        │
   │0x7f0c1a6aadca <lj_vm_growstack_f+23>   mov    %ebp,%edi                  
                                                                              
                        │
   │0x7f0c1a6aadcc <lj_vm_growstack_f+25>   callq  0x7f0c1a6b4470 
<lj_state_growstack>                                                          
                                    │
   │0x7f0c1a6aadd1 <lj_vm_growstack_f+30>   mov    0x10(%rbp),%edx            
                                                                              
                        │
   │0x7f0c1a6aadd4 <lj_vm_growstack_f+33>   mov    0x18(%rbp),%eax            
                                                                              
                        │
   │0x7f0c1a6aadd7 <lj_vm_growstack_f+36>   mov    -0x8(%rdx),%ebp            
                                                                              
                        │
   │0x7f0c1a6aadda <lj_vm_growstack_f+39>   sub    %edx,%eax                  
                                                                              
                        │
   │0x7f0c1a6aaddc <lj_vm_growstack_f+41>   shr    $0x3,%eax                  
                                                                              
                        │
   │0x7f0c1a6aaddf <lj_vm_growstack_f+44>   add    $0x1,%eax                  
                                                                              
                        │
  >|0x7f0c1a6aade2 <lj_vm_growstack_f+47>   mov    0x10(%rbp),%ebx
   │0x7f0c1a6aade5 <lj_vm_growstack_f+50>   mov    (%rbx),%ecx                
                                                                              
                        │
   │0x7f0c1a6aade7 <lj_vm_growstack_f+52>   movzbl %cl,%ebp                   
                                                                              
                        │
   │0x7f0c1a6aadea <lj_vm_growstack_f+55>   movzbl %ch,%ecx                   
                                                                              
                        │
   │0x7f0c1a6aaded <lj_vm_growstack_f+58>   add    $0x4,%ebx                  
                                                                              
                        │
   │0x7f0c1a6aadf0 <lj_vm_growstack_f+61>   jmpq   *(%r14,%rbp,8)

It seems that data inside %rbp was corrupted?

(gdb) p/x $rbp
$1 = 0x7009a593
(gdb) x 0x7009a593
0x7009a593:     Cannot access memory at address 0x7009a593

(gdb) info thread
  Id   Target Id         Frame
* 1    LWP 2883818       0x00007f0c1a6aade2 in lj_vm_growstack_f () from 
/usr/local/marco/luajit/lib/libluajit-5.1.so.2

We are using the asynchronous OpenSSL mode (with the dasync engine), it
uses it’s own co-routines (implemented by setjmp/longjmp and the
makecontext/swapcontext).

The segmentation fault disappeared after disabling the OpenSSL
asynchronous mode. In addition, the frequency of this exception will reduce
if disables JIT.

We haven’t any idea about this? Maybe the implementation of the
asynchronous OpenSSL mode has some conflicts with LuaJIT?

Our LuaJIT version is
https://github.com/openresty/luajit2/releases/tag/v2.1–20171103 ;.

The Linux Kernel version is 4.9.0.

I also opened an issue in here:
https://github.com/openssl/openssl/issues/6864 ;.

Is there any idea for the fixup or work-around? Thanks!

Regards
Alex Zhang

Other related posts: