RFC: Low-overhead profiling for LuaJIT 2.1

  • From: Mike Pall <mike-1306@xxxxxxxxxx>
  • To: luajit@xxxxxxxxxxxxx
  • Date: Tue, 18 Jun 2013 22:29:09 +0200

I'm hereby soliciting input from LuaJIT users for a new feature:

Existing profilers for Lua and LuaJIT are based on Lua hooks and
debug queries. The use of these generic mechanisms incurs a high
overhead and some guesswork (e.g. tail calls, errors, yields).
Execution of a program under control of such a profiler causes
substantial slow-downs. Actual use of the program (gameplay) may
be impossible in some cases.

I'm happy to announce that GIANTS Software GmbH
  http://www.giants-software.com
is sponsoring the development of a new low-overhead profiling
functionality for LuaJIT 2.1!

  GIANTS Software develops a variety of simulation games for desktop,
  mobile and consoles. These games make extensive use of Lua for
  scripting and modding. Switching to LuaJIT was instrumental in
  reducing the CPU load and sustaining the required frame rates on
  all platforms.

The goal is to design and implement a new profiling functionality
that has a much lower overhead, better control of detail and high
flexibility. I've already devised a basic design for the new
profiling infrastructure -- it's split into three layers:

* Layer #1: Profile Collection

  Profile events triggered by the LuaJIT interpreter and by
  JIT-generated code are collected in a log buffer.

  This layer calls a C function for event timing. E.g. a dummy
  function for simple event counting or a high-precision timer
  (performance counter).

  Most of this layer is written in assembler for performance
  reasons. The bytecode dispatch table of the interpreter is
  partially patched as needed. Likewise, profiling calls are
  interspersed with the generated machine code.

  This layer allows fine-grained control of detail, down to the
  module, function, basic block, line or bytecode level. Profiling
  overhead depends on the requested level of detail. This feature
  is zero-cost if not enabled.

  Custom marks and hierarchical, user-defined zones (start/stop)
  can be mixed into the collected profile.

* Layer #2: Profile Coalescing

  This layer periodically extracts and combines profile events
  whenever the log buffer overflows. Direct access to internal
  data structures is required to perform this efficiently. This
  layer provides a C interface to control the profile coalescing.

  This layer can either be used by user-supplied code to display a
  live status in an embedded context (e.g. in-game performance OSD)
  or by higher-level tools.

* Layer #3: Profile Reporting

  This layer generates textual profile reports based on the
  collected data. It generates either a "Top 10"-style ranking or
  annotated source code. This module is written in Lua and uses
  the FFI to access the lower levels. The module also supplies a
  standard command-line option to invoke the profiler (-jp).

  Developers are welcome to use the underlying APIs and contribute
  3rd-party tools, e.g. coverage analysis, graphical profilers or
  IDE integration.

The profiling functionality will be implemented over the coming
months -- you can follow the progress in the v2.1 branch of the
git repository. The profiling functionality will be added to the
interpreter first and then the JIT compiler. x86 + x64 will be
first, the other supported architectures will follow.

The intention of this posting is to gather feedback for the design
phase. So, if you have any questions or specific needs or related
ideas, please speak up -- thank you!

--Mike

Other related posts: