Hi, It seems that what I am doing is not recommended - i.e. invoke small Lua functions from C many times. The requirement I have unfortunately is such that I cannot avoid this. In the product I am working on we want to allow users to specify functions in Lua ... so unfortunately the callback mechanism is the only option. This is somewhat similar to the numerical integration example that is advised against in the FFI semantics web page. I find that if I do a Release build using standard Lua and compare performance with LuaJIT I do not see much difference. I assume that the reason for this is the callback overhead is negating any performance benefits. The implementation is further complicated because the Lua function called from C, must itself call C functions to do some things. I tried using FFI instead of standard Lua mechanism to call these C functions, but for some reason I don't understand, this performs worse than the standard mechanism. On the other hand, returning the Lua closure as a function pointer to the C code results in better performance than the mechanism I was using before (described in my post below). The FFI mechanism is astonishing of course. I am able to receive a C structure and assign a Lua closure to a function pointer in the structure - Wow! From: dibyendu.majumdar@xxxxxxxx To: luajit@xxxxxxxxxxxxx Subject: RE: Access C env per LUA instance Date: Tue, 29 Oct 2013 23:59:49 +0000 Mike wrote: > If you need top performance and you're calling lots of C functions, > then you need to switch from the classic Lua/C API to the FFI. > But this implies you cannot get at the lua_State pointer from a C > function called by the FFI, anyway. So I wouldn't bother with any > temporary workarounds that mess with the lua_State or such. Ok understood. > For the FFI approach there are two solutions: > > a) Do it on the C side: use one of the faster thread-local storage > (TLS) APIs, i.e. one that doesn't call a function for every TLS > access. The optimum is a mov reg, fs:[fixed_tls_offset]. How to > achieve that is different for every OS and C compiler. > I have been looking at using thread local storage. However, don't know enough about how TLS is implemented to work out which API is best. > b) Classic OO approach: pass a context argument to every function. > If the context is (say) a pointer or an integer in an immutable > upvalue then the JIT compiler will inline that as a constant for > each C call. > Thanks - this seems easiest ... I will try this. I have another question. As I need to call the LUA functions many times from the C code - I am doing this: I am calling a LUA function to pre-compute values where possible and create a closure with the bare minimum steps. Then I store the closure in the registry by calling: luaL_ref(L, LUA_REGISTRYINDEX); Subsequently I execute the closure by calling it from C code - this is done many times ... and the operations performed by the closure are numeric. I am hoping that the closure will be JITed first time and then execute at native speed later ... is that a correct assumption? Regards