Simon Cooke wrote: > Running with -jv I get for the first case: > > [TRACE 1 ffi_test3.lua:17 loop] > array(N) : 0.03s 0.89406967163086 ns/element > [TRACE 2 (1/0) ffi_test3.lua:17 loop] > float[N] : 0.03s 0.89406967163086 ns/element > [TRACE --- (2/0) ffi_test3.lua:17 -- NYI: unsupported C type conversion] > [TRACE --- (2/0) ffi_test3.lua:17 -- NYI: unsupported C type conversion] > [TRACE --- (2/0) ffi_test3.lua:17 -- NYI: unsupported C type conversion] > [TRACE --- (2/0) ffi_test3.lua:17 -- NYI: unsupported C type conversion] > [TRACE 3 (2/0) ffi_test3.lua:17 -- fallback to interpreter] > boxed[N] : 2.83s 84.340572357178 ns/element > > as expected, but for the second: > > [TRACE 1 ffi_test3.lua:17 loop] > float[N] : 0.03s 0.89406967163086 ns/element > [TRACE --- (1/0) ffi_test3.lua:17 -- NYI: unsupported C type conversion] > [TRACE --- (1/0) ffi_test3.lua:17 -- NYI: unsupported C type conversion] > [TRACE --- (1/0) ffi_test3.lua:17 -- NYI: unsupported C type conversion] > [TRACE --- (1/0) ffi_test3.lua:17 -- NYI: unsupported C type conversion] > [TRACE 2 (1/0) ffi_test3.lua:17 -- fallback to interpreter] > boxed[N] : 2.667s 79.482793807983 ns/element > [TRACE 3 ffi_test3.lua:11 return] > array(N) : 2.699s 80.43646812439 ns/element > > What could be causing the slower performance here? The way you've set up the benchmark test wrapper, you're actually performing several non-monomorphic dispatches inside the same loop, since the types of 'a' and 'c' differ for each invocation. That's not very representative for real use cases. The JIT compiler compiles the 'array(N)' loop first. On second invocation with 'float[N]', it recognizes that some type check is failing early on and re-compiles the loop. That trace is attached to the first trace. The third case doesn't compile and a fallback to the interpreter is attached to the second trace. If you reorder the tests, the second trace fails and falls back to the interpreter. There's no way it can recover from that, so the third invocation of the loop always goes to the interpreter. Usually, such a loop would only have a single type for each variable. So pass a separate function (including a loop) for each test case to the test wrapper and all is well. BTW: You really don't want to write ffi.new('float', 10). Just use the number 10. Also, all numeric calculations are performed with doubles. Using floats for storage saves memory (for big arrays). But arithmetic is usually slower due to the required float<->double conversions. --Mike