Mike Pall <mike-1208@xxxxxxxxxx> wrote: > The __call metamethod must be a plain function, you can't pass > arbitrary callable objects (such as an FFI function). I don't see > an easy way to work around this -- it would require double > dispatching (or many more dispatches in the worst case). > No problem - it works if I simply wrap the FFI function in a plain lua function. >> I encountered similar behaviour with other metamethods, but with >> different errors. > > Umm, examples? E.g. __add can certainly be an FFI function. > The specific cases I tried are __len, __newindex and __index. It seems that __len does work, but needs the signature len(void * a, void *) with an extra dummy argument [1]. However, __newindex gives the following error: 'void ()' cannot be indexed with 'number' where the FFI signature is: void newindex(void * ud, size_t k, double v). A similar error occurs for __index. Again I can work around this with simple wrappers, with no performance penalty, but perhaps there is a better solution. My specific use case is a class for large arrays (basically a C++ std::vector<T>) using __call to read elements and __newindex to write. Using the wrappers I can get things working nicely with 10x improvement in performance, but now I have encountered a new problem... When a loop over array elements reaches a certain complexity, it appears to blacklist the __call function preventing JIT compilation of array accesses for the remainder of the run. The following simplified example illustrates this (using current git head), where the array is replaced by a simple userdata that ignores the array index: /* test.cpp */ #include "lua.hpp" extern "C" { // returns a userdata with an empty metatable int luaopen_testmodule(lua_State * L) { void * ud = lua_newuserdata(L,8); * reinterpret_cast<double*>(ud) = 0; lua_newtable(L); lua_setmetatable(L,-2); return 1; } double get_double(void * ud,size_t k) { return * reinterpret_cast<double*>(ud); } void set_double(void * ud,size_t k,double v) { * reinterpret_cast<double*>(ud) = v; } } ------ test.lua local ffi = require'ffi' ffi.cdef[[ double get_double(void * ud, size_t k); void set_double(void * ud, size_t k, double v); ]] clib = ffi.load'test' ud = require'testmodule' local mt = getmetatable(ud) mt.__call = function(self,k) return clib.get_double(self,k) end -- line 13 mt.__newindex = function(self,k,v) clib.set_double(self,k,v) end local N = 10^7 require'jit.v'.on'-' local t0 = os.clock() for i = 0,N-1 do ud[i] = ud(i) + 1 end -- line 21 print((os.clock()-t0)/N*1e9 .. 'ns/element') for i = 0,N/1000-1 do ud[i] = ud(i) + 1 ; ud[i] = ud(i) + 1 ; ud[i] = ud(i) + 1 ; ud[i] = ud(i) + 1 -- line 25 ud[i] = ud(i) + 1 ; ud[i] = ud(i) + 1 ; ud[i] = ud(i) + 1 ; ud[i] = ud(i) + 1 ud[i] = ud(i) + 1 ; ud[i] = ud(i) + 1 ; ud[i] = ud(i) + 1 ; ud[i] = ud(i) + 1 ud[i] = ud(i) + 1 ; ud[i] = ud(i) + 1 ; ud[i] = ud(i) + 1 ; ud[i] = ud(i) + 1 end local t0 = os.clock() for i = 0,N-1 do ud[i] = ud(i) + 1 end -- line 32 print((os.clock()-t0)/N*1e9 .. 'ns/element') print(ud(0)) ------ outputs: [TRACE 1 test2.lua:21 loop] 4ns/element [TRACE --- test2.lua:13 -- leaving loop in root trace at test2.lua:26] [TRACE 2 test2.lua:14 return] [TRACE --- test2.lua:13 -- leaving loop in root trace at test2.lua:25] [TRACE --- test2.lua:13 -- leaving loop in root trace at test2.lua:28] [TRACE --- test2.lua:13 -- leaving loop in root trace at test2.lua:25] [TRACE --- test2.lua:24 -- loop unroll limit reached at test2.lua:13] [TRACE --- test2.lua:13 -- leaving loop in root trace at test2.lua:27] [TRACE --- test2.lua:24 -- loop unroll limit reached at test2.lua:13] [TRACE --- test2.lua:13 -- leaving loop in root trace at test2.lua:26] [TRACE --- test2.lua:24 -- loop unroll limit reached at test2.lua:13] [TRACE --- test2.lua:13 -- leaving loop in root trace at test2.lua:28] [TRACE --- test2.lua:24 -- loop unroll limit reached at test2.lua:13] [TRACE --- test2.lua:24 -- loop unroll limit reached at test2.lua:13] [TRACE --- test2.lua:13 -- leaving loop in root trace at test2.lua:26] [TRACE --- test2.lua:24 -- loop unroll limit reached at test2.lua:13] [TRACE --- test2.lua:13 -- leaving loop in root trace at test2.lua:28] [TRACE --- test2.lua:24 -- loop unroll limit reached at test2.lua:13] [TRACE --- test2.lua:13 -- leaving loop in root trace at test2.lua:27] [TRACE --- test2.lua:24 -- loop unroll limit reached at test2.lua:13] [TRACE --- test2.lua:13 -- leaving loop in root trace at test2.lua:28] [TRACE --- test2.lua:32 -- blacklisted at test2.lua:13] [TRACE --- test2.lua:32 -- blacklisted at test2.lua:13] [TRACE --- test2.lua:32 -- blacklisted at test2.lua:13] [TRACE --- test2.lua:32 -- blacklisted at test2.lua:13] [TRACE --- test2.lua:32 -- blacklisted at test2.lua:13] [TRACE --- test2.lua:32 -- blacklisted at test2.lua:13] [TRACE --- test2.lua:32 -- blacklisted at test2.lua:13] [TRACE --- test2.lua:32 -- blacklisted at test2.lua:13] [TRACE --- test2.lua:32 -- blacklisted at test2.lua:13] [TRACE --- test2.lua:32 -- blacklisted at test2.lua:13] [TRACE --- test2.lua:32 -- blacklisted at test2.lua:13] 286ns/element 20160000 ---------- The same loop takes 4ns/element at the start and 286ns/element at the end due to the blacklisted __call. For comparison, if the complex loop has only 15 increments instead of 16 it gives the following: ---------- [TRACE 1 test2.lua:21 loop] 4ns/element [TRACE --- test2.lua:13 -- leaving loop in root trace at test2.lua:27] [TRACE 2 test2.lua:14 return] [TRACE --- test2.lua:13 -- leaving loop in root trace at test2.lua:26] [TRACE --- test2.lua:13 -- leaving loop in root trace at test2.lua:28] [TRACE --- test2.lua:13 -- leaving loop in root trace at test2.lua:27] [TRACE 3 test2.lua:24 loop] [TRACE 4 test2.lua:32 loop] 4ns/element 20150000 ---------- It seems that the following event is causing the __call (at line 13) to be blacklisted: [TRACE --- test2.lua:24 -- loop unroll limit reached at test2.lua:13] Is this the expected behaviour in this case? It seems odd that the __call should be impacted instead of the loop. Thanks, Simon [1] http://lua-users.org/lists/lua-l/2010-01/msg00160.html