LuaJIT and mmap() scalability

From: Alexey Kopytov <akopytov@xxxxxxxxx>
To: luajit@xxxxxxxxxxxxx
Date: Sun, 19 Feb 2017 21:51:16 +0300

Hello,

I maintain a benchmark application that has been recently migrated from PUC Lua to LuaJIT. The application may create lots (thousands) of threads with some workloads.

I started hitting issues with mmap() scalability after the migration due to the fact that LuaJIT uses mmap() intensively during the initial trace generation. Which manifests itself as much lower performance during the first 10-30 seconds of a benchmark run, with CPU being mostly idle in workloads involving 1000+ threads.

Which is apparently a long-known problem in the Linux kernel: https://lkml.org/lkml/2013/1/2/299

In a nutshell, mmap()/munmap() calls performed concurrently with multiple threads within the same process are serialized around a per-process lock. There have been a number of attempts to fix it, but apparently it's still there.

Searching the web does not indicate that anyone has encountered this specific problem with LuaJIT.

Is there anything that can be done at the application or LuaJIT level to circumvent or relax the mmap() bottleneck?

Here's a sample perf report captured during the initial period when benchmark numbers are low:

88.57% 0.00% swapper [unknown] [k] 0x0000000080b50054
           88.22%
                0x80b50054
                secondary_start_kernel
                cpu_startup_entry
                arch_cpu_idle

    10.74%     0.00%  sysbench    sysbench               [.] worker_thread
           4.04%
                worker_thread
                lua_pcall
                lj_trace_ins
                __mmap64
                el0_svc_naked
                sys_mmap
                sys_mmap_pgoff
                vm_mmap_pgoff
                down_write_killable
                rwsem_down_write_failed_killable
                rwsem_optimistic_spin
                osq_lock

           3.88%
                worker_thread
                lua_pcall
                lj_trace_ins
                __GI___munmap
                el0_svc_naked
                sys_munmap
                down_write_killable
                rwsem_down_write_failed_killable
                rwsem_optimistic_spin
                osq_lock

Best regards,
Alexey.

Follow-Ups:
- Re: LuaJIT and mmap() scalability
  - From: Mike Pall
- Re: LuaJIT and mmap() scalability
  - From: Alexey Kopytov

LuaJIT and mmap() scalability

Other related posts: