Re: Faster Debug Tracing and Code Coverage?

  • From: Peter Cawley <corsix@xxxxxxxxxx>
  • To: luajit@xxxxxxxxxxxxx
  • Date: Fri, 25 Sep 2015 14:44:47 +0100

If you want the VM to call you, the current options are the debug API
and the profiling API, with both APIs being available from both Lua
and C. If the debug API is too slow and the profiling API is too
inaccurate, then the only other thing which comes to mind is
instrumentation. A naive approach would be to insert the equivalent of
"hit[__LINE__]=true" on every line of source (which might be easier to
do via bytecode manipulation), though I'm not sure how well such an
approach would perform. I'd actually like to see better VM support for
cheap instrumentation and cheap breakpoints, possibly along the lines
of:

New jit.setontrap(f) function, which sets the trap handler for the
current Lua universe to be the given function.

New BC_TRAP instruction: INS_AD form, with semantics of:
subtract 1 from D
store updated D field to PC_RD
if D field was zero before subtraction then subtract 4 from PC and
call the registered trap handler, passing PC as a single lightuserdata
argument
dispatch next instruction

The trap handler is expected to cast its argument to uint32_t*, at
which point it can read the A field of the TRAP instruction and/or
replace the TRAP instruction with something else. Using this, one
could implement:

Zero-overhead breakpoints: replace bytecode instruction with TRAP(A,
0), have trap handler put back in the original bytecode instruction.
The A field could be used to store extra tracking data (such as the
breaknumber number), or it could be ignored.

Low-overhead yes/no coverage: instrument bytecode, inserting TRAP(A,
0) at various points (once per function, once per line, once per basic
block, whatever), have trap handler replace TRAP instruction with a
no-op (such as MOV(0, 0) or JMP($+0)). After execution has finished,
any bytecode which still contains TRAP instructions wasn't executed.
The no-op instructions give a very slight on-going cost for
interpreted code, but are zero-cost for code which gets JIT compiled.

Low-overhead non-sampling profiler: instrument bytecode as for yes/no
coverage, but insert TRAP(0, 65535) at various points. The trap
handler increments the A field of the instruction, which in
combination with the D field, effectively gives an inline 24 bit
counter of how many times the TRAP instruction was executed. If 24
bits are insufficient, the trap handler could either saturate at the
limit, or use an external hash table for the high bits of the counter.
After execution has finished, all of the inline 24 bit counters (and
the external hash table, if used) can be consulted to obtain execution
counts. Note that the JIT compiler would need to understand BC_TRAP in
order to keep this being low-overhead (as otherwise it would prevent
JIT compilation, which is a significant price).


On Thu, Sep 24, 2015 at 11:20 PM, Benn Bollay <benn.bollay@xxxxxxxxx> wrote:

Profiling is tremendously valuable, especially when you're looking for hot
spots or other opportunities to optimize. In this case I'm very willing to
accept a moderate performance hit in exchange for line-level knowledge about
all of the code paths that are followed. Since these are build-time
unittests that are running, the accuracy is more important than the
performance impact (up to a certain level).

The /functionality/ provided by debug.trace (assuming accuracy, which is eh
debatable) is correct. Just the performance penalty causes any unittests
(which are, admittedly poorly written) depending on timing to fail.

Cheers,
--B

On Thu, Sep 24, 2015 at 9:58 AM, Peter Cawley <corsix@xxxxxxxxxx> wrote:

Have you looked into the LuaJIT profiler API? (see

http://htmlpreview.github.io/?https://github.com/LuaJIT/LuaJIT/blob/v2.1/doc/ext_profiler.html
)

On Thu, Sep 24, 2015 at 5:44 PM, Benn Bollay <benn.bollay@xxxxxxxxx>
wrote:
Hi folks -

I'm looking at generating test coverage information for a large quantity
of
C application embedded luajit scripts in an application. I'm able to
generate some data by using the debug hooks, but the performance impact
guarantees that certain tests will always fail.

Is there a hook I can engage that's purely in C to capture the same
information? Or, even better, has anyone written such a hook to output
gcda/gcno data for lua files? Being able to integrate directly into
lcov
would be fabulous.

Cheers,
--B



Other related posts: