I'm hereby soliciting input from LuaJIT users for a new feature: Existing profilers for Lua and LuaJIT are based on Lua hooks and debug queries. The use of these generic mechanisms incurs a high overhead and some guesswork (e.g. tail calls, errors, yields). Execution of a program under control of such a profiler causes substantial slow-downs. Actual use of the program (gameplay) may be impossible in some cases. I'm happy to announce that GIANTS Software GmbH http://www.giants-software.com is sponsoring the development of a new low-overhead profiling functionality for LuaJIT 2.1! GIANTS Software develops a variety of simulation games for desktop, mobile and consoles. These games make extensive use of Lua for scripting and modding. Switching to LuaJIT was instrumental in reducing the CPU load and sustaining the required frame rates on all platforms. The goal is to design and implement a new profiling functionality that has a much lower overhead, better control of detail and high flexibility. I've already devised a basic design for the new profiling infrastructure -- it's split into three layers: * Layer #1: Profile Collection Profile events triggered by the LuaJIT interpreter and by JIT-generated code are collected in a log buffer. This layer calls a C function for event timing. E.g. a dummy function for simple event counting or a high-precision timer (performance counter). Most of this layer is written in assembler for performance reasons. The bytecode dispatch table of the interpreter is partially patched as needed. Likewise, profiling calls are interspersed with the generated machine code. This layer allows fine-grained control of detail, down to the module, function, basic block, line or bytecode level. Profiling overhead depends on the requested level of detail. This feature is zero-cost if not enabled. Custom marks and hierarchical, user-defined zones (start/stop) can be mixed into the collected profile. * Layer #2: Profile Coalescing This layer periodically extracts and combines profile events whenever the log buffer overflows. Direct access to internal data structures is required to perform this efficiently. This layer provides a C interface to control the profile coalescing. This layer can either be used by user-supplied code to display a live status in an embedded context (e.g. in-game performance OSD) or by higher-level tools. * Layer #3: Profile Reporting This layer generates textual profile reports based on the collected data. It generates either a "Top 10"-style ranking or annotated source code. This module is written in Lua and uses the FFI to access the lower levels. The module also supplies a standard command-line option to invoke the profiler (-jp). Developers are welcome to use the underlying APIs and contribute 3rd-party tools, e.g. coverage analysis, graphical profilers or IDE integration. The profiling functionality will be implemented over the coming months -- you can follow the progress in the v2.1 branch of the git repository. The profiling functionality will be added to the interpreter first and then the JIT compiler. x86 + x64 will be first, the other supported architectures will follow. The intention of this posting is to gather feedback for the design phase. So, if you have any questions or specific needs or related ideas, please speak up -- thank you! --Mike