Re: FFI methods for userdata objects

  • From: Simon Cooke <sjcfwd@xxxxxxxxx>
  • To: luajit@xxxxxxxxxxxxx
  • Date: Wed, 5 Sep 2012 13:59:32 -0400

Mike Pall <mike-1208@xxxxxxxxxx> wrote:
> The __call metamethod must be a plain function, you can't pass
> arbitrary callable objects (such as an FFI function). I don't see
> an easy way to work around this -- it would require double
> dispatching (or many more dispatches in the worst case).
>

No problem - it works if I simply wrap the FFI function in a plain lua function.

>> I encountered similar behaviour with other metamethods, but with
>> different errors.
>
> Umm, examples? E.g. __add can certainly be an FFI function.
>

The specific cases I tried are __len, __newindex and __index. It seems
that __len does work, but needs the signature len(void * a, void *)
with an extra dummy argument [1]. However, __newindex gives the
following error:

    'void ()' cannot be indexed with 'number'

where the FFI signature is: void newindex(void * ud, size_t k, double
v). A similar error occurs for __index. Again I can work around this
with simple wrappers, with no performance penalty, but perhaps there
is a better solution.

My specific use case is a class for large arrays (basically a C++
std::vector<T>) using __call to read elements and __newindex to write.
Using the wrappers I can get things working nicely with 10x
improvement in performance, but now I have encountered a new
problem...

When a loop over array elements reaches a certain complexity, it
appears to blacklist the __call function preventing JIT compilation of
array accesses for the remainder of the run. The following simplified
example illustrates this (using current git head), where the array is
replaced by a simple userdata that ignores the array index:

/* test.cpp */

#include "lua.hpp"

extern "C" {

// returns a userdata with an empty metatable
int luaopen_testmodule(lua_State * L)
{
    void * ud = lua_newuserdata(L,8);
    * reinterpret_cast<double*>(ud) = 0;
    lua_newtable(L);
    lua_setmetatable(L,-2);
    return 1;
}

double get_double(void * ud,size_t k) { return *
reinterpret_cast<double*>(ud); }
void set_double(void * ud,size_t k,double v) { *
reinterpret_cast<double*>(ud) = v; }

}

------ test.lua

local ffi = require'ffi'

ffi.cdef[[
double get_double(void * ud, size_t k);
void   set_double(void * ud, size_t k, double v);
]]
clib = ffi.load'test'

ud = require'testmodule'
local mt = getmetatable(ud)
mt.__call     = function(self,k) return clib.get_double(self,k) end -- line 13
mt.__newindex = function(self,k,v) clib.set_double(self,k,v) end

local N = 10^7

require'jit.v'.on'-'

local t0 = os.clock()
for i = 0,N-1 do ud[i] = ud(i) + 1 end -- line 21
print((os.clock()-t0)/N*1e9 .. 'ns/element')

for i = 0,N/1000-1 do
    ud[i] = ud(i) + 1 ; ud[i] = ud(i) + 1 ; ud[i] = ud(i) + 1 ; ud[i]
= ud(i) + 1 -- line 25
    ud[i] = ud(i) + 1 ; ud[i] = ud(i) + 1 ; ud[i] = ud(i) + 1 ; ud[i]
= ud(i) + 1
    ud[i] = ud(i) + 1 ; ud[i] = ud(i) + 1 ; ud[i] = ud(i) + 1 ; ud[i]
= ud(i) + 1
    ud[i] = ud(i) + 1 ; ud[i] = ud(i) + 1 ; ud[i] = ud(i) + 1 ; ud[i]
= ud(i) + 1
end

local t0 = os.clock()
for i = 0,N-1 do ud[i] = ud(i) + 1 end -- line 32
print((os.clock()-t0)/N*1e9 .. 'ns/element')

print(ud(0))

------ outputs:

[TRACE   1 test2.lua:21 loop]
4ns/element
[TRACE --- test2.lua:13 -- leaving loop in root trace at test2.lua:26]
[TRACE   2 test2.lua:14 return]
[TRACE --- test2.lua:13 -- leaving loop in root trace at test2.lua:25]
[TRACE --- test2.lua:13 -- leaving loop in root trace at test2.lua:28]
[TRACE --- test2.lua:13 -- leaving loop in root trace at test2.lua:25]
[TRACE --- test2.lua:24 -- loop unroll limit reached at test2.lua:13]
[TRACE --- test2.lua:13 -- leaving loop in root trace at test2.lua:27]
[TRACE --- test2.lua:24 -- loop unroll limit reached at test2.lua:13]
[TRACE --- test2.lua:13 -- leaving loop in root trace at test2.lua:26]
[TRACE --- test2.lua:24 -- loop unroll limit reached at test2.lua:13]
[TRACE --- test2.lua:13 -- leaving loop in root trace at test2.lua:28]
[TRACE --- test2.lua:24 -- loop unroll limit reached at test2.lua:13]
[TRACE --- test2.lua:24 -- loop unroll limit reached at test2.lua:13]
[TRACE --- test2.lua:13 -- leaving loop in root trace at test2.lua:26]
[TRACE --- test2.lua:24 -- loop unroll limit reached at test2.lua:13]
[TRACE --- test2.lua:13 -- leaving loop in root trace at test2.lua:28]
[TRACE --- test2.lua:24 -- loop unroll limit reached at test2.lua:13]
[TRACE --- test2.lua:13 -- leaving loop in root trace at test2.lua:27]
[TRACE --- test2.lua:24 -- loop unroll limit reached at test2.lua:13]
[TRACE --- test2.lua:13 -- leaving loop in root trace at test2.lua:28]
[TRACE --- test2.lua:32 -- blacklisted at test2.lua:13]
[TRACE --- test2.lua:32 -- blacklisted at test2.lua:13]
[TRACE --- test2.lua:32 -- blacklisted at test2.lua:13]
[TRACE --- test2.lua:32 -- blacklisted at test2.lua:13]
[TRACE --- test2.lua:32 -- blacklisted at test2.lua:13]
[TRACE --- test2.lua:32 -- blacklisted at test2.lua:13]
[TRACE --- test2.lua:32 -- blacklisted at test2.lua:13]
[TRACE --- test2.lua:32 -- blacklisted at test2.lua:13]
[TRACE --- test2.lua:32 -- blacklisted at test2.lua:13]
[TRACE --- test2.lua:32 -- blacklisted at test2.lua:13]
[TRACE --- test2.lua:32 -- blacklisted at test2.lua:13]
286ns/element
20160000
----------

The same loop takes 4ns/element at the start and 286ns/element at the
end due to the blacklisted __call.

For comparison, if the complex loop has only 15 increments instead of
16 it gives the following:

----------
[TRACE   1 test2.lua:21 loop]
4ns/element
[TRACE --- test2.lua:13 -- leaving loop in root trace at test2.lua:27]
[TRACE   2 test2.lua:14 return]
[TRACE --- test2.lua:13 -- leaving loop in root trace at test2.lua:26]
[TRACE --- test2.lua:13 -- leaving loop in root trace at test2.lua:28]
[TRACE --- test2.lua:13 -- leaving loop in root trace at test2.lua:27]
[TRACE   3 test2.lua:24 loop]
[TRACE   4 test2.lua:32 loop]
4ns/element
20150000
----------

It seems that the following event is causing the __call (at line 13)
to be blacklisted:

[TRACE --- test2.lua:24 -- loop unroll limit reached at test2.lua:13]

Is this the expected behaviour in this case? It seems odd that the
__call should be impacted instead of the loop.

Thanks,
Simon

[1] http://lua-users.org/lists/lua-l/2010-01/msg00160.html

Other related posts: