On Fri, May 25, 2012 at 4:12 PM, Mike Pall <mike-1205@xxxxxxxxxx> wrote: > Simon Cooke wrote: >> I've been trying out the FFI library recently, and have tested the >> variable-length array feature with mixed performance results. For >> native types (e.g. float, double) it works very efficiently, but for >> simple structs the performance drops dramatically, by ~ 50x. > > http://luajit.org/ext_ffi_semantics.html#status > > [...] > The following operations are currently not compiled and may > exhibit suboptimal performance, especially when used in inner > loops: > > * Array/struct copies and bulk initializations. > [...] > Thanks for the pointers. My actual use case is for arrays of fixed-length vectors {x, y, z}. I managed to find a workaround for now by adding metatables and performing the copy manually: ----------------------------------------------------------------- local ffi = require("ffi") ffi.cdef[[ typedef struct { float x; } boxed; ]] local boxed = ffi.metatype( 'boxed', { __index = { copy_to = function(self,p) p.x = self.x end }, __tostring = function(self) return '('..self.x..')' end, }) local array = ffi.metatype([[ struct { boxed p[?]; } ]], { __newindex = function(self,i,v) v:copy_to(self.p+i) end, __index = function(self,i) return self.p[i] end, }) local function test(s,N,a,c) local t0 = os.clock() for i = 0,N-1 do a[i] = c end print(s..' : '..os.clock()-t0 ..'s '..(os.clock()-t0)/N*1e9 ..' ns/element') end local N = 2^25 test('array(N)',N, array(N), boxed(10)) test('float[N]',N, ffi.new('float[?]',N), ffi.new('float',10) ) test('boxed[N]',N, ffi.new('boxed[?]',N), ffi.new('boxed',10) ) ----------------------------------------------------------------- The first test uses the new array, which gives performance equal to the native float array: array(N) : 0.029s 0.86426734924316 ns/element float[N] : 0.029s 0.86426734924316 ns/element boxed[N] : 2.661s 79.303979873657 ns/element However, I find that when I reorder the tests I get very different results: float[N] : 0.029s 0.86426734924316 ns/element boxed[N] : 2.596s 77.366828918457 ns/element array(N) : 2.864s 85.353851318359 ns/element Running with -jv I get for the first case: [TRACE 1 ffi_test3.lua:17 loop] array(N) : 0.03s 0.89406967163086 ns/element [TRACE 2 (1/0) ffi_test3.lua:17 loop] float[N] : 0.03s 0.89406967163086 ns/element [TRACE --- (2/0) ffi_test3.lua:17 -- NYI: unsupported C type conversion] [TRACE --- (2/0) ffi_test3.lua:17 -- NYI: unsupported C type conversion] [TRACE --- (2/0) ffi_test3.lua:17 -- NYI: unsupported C type conversion] [TRACE --- (2/0) ffi_test3.lua:17 -- NYI: unsupported C type conversion] [TRACE 3 (2/0) ffi_test3.lua:17 -- fallback to interpreter] boxed[N] : 2.83s 84.340572357178 ns/element as expected, but for the second: [TRACE 1 ffi_test3.lua:17 loop] float[N] : 0.03s 0.89406967163086 ns/element [TRACE --- (1/0) ffi_test3.lua:17 -- NYI: unsupported C type conversion] [TRACE --- (1/0) ffi_test3.lua:17 -- NYI: unsupported C type conversion] [TRACE --- (1/0) ffi_test3.lua:17 -- NYI: unsupported C type conversion] [TRACE --- (1/0) ffi_test3.lua:17 -- NYI: unsupported C type conversion] [TRACE 2 (1/0) ffi_test3.lua:17 -- fallback to interpreter] boxed[N] : 2.667s 79.482793807983 ns/element [TRACE 3 ffi_test3.lua:11 return] array(N) : 2.699s 80.43646812439 ns/element What could be causing the slower performance here? Simon