Re: FFI array performance

From: Mike Pall <mike-1205@xxxxxxxxxx>
To: luajit@xxxxxxxxxxxxx
Date: Wed, 30 May 2012 20:12:12 +0200

Simon Cooke wrote:
> Running with -jv I get for the first case:
>
> [TRACE   1 ffi_test3.lua:17 loop]
> array(N) : 0.03s   0.89406967163086 ns/element
> [TRACE   2 (1/0) ffi_test3.lua:17 loop]
> float[N] : 0.03s   0.89406967163086 ns/element
> [TRACE --- (2/0) ffi_test3.lua:17 -- NYI: unsupported C type conversion]
> [TRACE --- (2/0) ffi_test3.lua:17 -- NYI: unsupported C type conversion]
> [TRACE --- (2/0) ffi_test3.lua:17 -- NYI: unsupported C type conversion]
> [TRACE --- (2/0) ffi_test3.lua:17 -- NYI: unsupported C type conversion]
> [TRACE   3 (2/0) ffi_test3.lua:17 -- fallback to interpreter]
> boxed[N] : 2.83s   84.340572357178 ns/element
>
> as expected, but for the second:
> 
> [TRACE   1 ffi_test3.lua:17 loop]
> float[N] : 0.03s   0.89406967163086 ns/element
> [TRACE --- (1/0) ffi_test3.lua:17 -- NYI: unsupported C type conversion]
> [TRACE --- (1/0) ffi_test3.lua:17 -- NYI: unsupported C type conversion]
> [TRACE --- (1/0) ffi_test3.lua:17 -- NYI: unsupported C type conversion]
> [TRACE --- (1/0) ffi_test3.lua:17 -- NYI: unsupported C type conversion]
> [TRACE   2 (1/0) ffi_test3.lua:17 -- fallback to interpreter]
> boxed[N] : 2.667s   79.482793807983 ns/element
> [TRACE   3 ffi_test3.lua:11 return]
> array(N) : 2.699s   80.43646812439 ns/element
> 
> What could be causing the slower performance here?

The way you've set up the benchmark test wrapper, you're actually
performing several non-monomorphic dispatches inside the same
loop, since the types of 'a' and 'c' differ for each invocation.
That's not very representative for real use cases.

The JIT compiler compiles the 'array(N)' loop first. On second
invocation with 'float[N]', it recognizes that some type check is
failing early on and re-compiles the loop. That trace is attached
to the first trace. The third case doesn't compile and a fallback
to the interpreter is attached to the second trace.

If you reorder the tests, the second trace fails and falls back to
the interpreter. There's no way it can recover from that, so the
third invocation of the loop always goes to the interpreter.

Usually, such a loop would only have a single type for each
variable. So pass a separate function (including a loop) for each
test case to the test wrapper and all is well.

BTW: You really don't want to write ffi.new('float', 10). Just use
the number 10.

Also, all numeric calculations are performed with doubles. Using
floats for storage saves memory (for big arrays). But arithmetic
is usually slower due to the required float<->double conversions.

--Mike

Follow-Ups:
- Re: FFI array performance
  - From: Simon Cooke

References:
- FFI array performance
  - From: Simon Cooke
- Re: FFI array performance
  - From: Mike Pall
- Re: FFI array performance
  - From: Simon Cooke

Re: FFI array performance

Other related posts: