Re: Luajit struct vs table performance.

  • From: Evan Wies <evan@xxxxxxxxxxxxx>
  • To: luajit@xxxxxxxxxxxxx
  • Date: Thu, 03 Apr 2014 11:30:16 -0400

Earlier this year, I converted a particular program from using a Lua-table-of-tables to a FFI-based HashMap of FFI structs. I actually did it because of memory limitations, not for performance gains. The running time for the job was reduced by about 30%, shaving about 3 minutes off a 10 minute run. But the best part was that I could then run my program on a full data set, rather than sharding it.


I did not dive into why there was this improvement and am posting this anecdote just in response to your query. It was not a micro-benchmark, but a "real-world script" and I was not rigorous about measurement (just ran 'time' on each run). However, in my specific case, I relieved a lot of strain from the GC because I started "managing" the objects -- going from hundreds of thousands of GC objects to <1000. So I'm sure that contributed to the improvement. Compared to your case, my structs were filled with numbers (mostly doubles and uint64_t), so there wasn't so much conversion going on.

Your case is much different than mine because you want pointers to strings. You are correct -- you will definitley need to be anchoring your cdata string values, keeping them alive longer than the pointer members. If you are going to constantly create cdata strings and copy from Lua strings to these C-strings then stash those string pointers in your structs, then the whole thing will be a lot more complicated and probably slower. If you are actually creating one big character buffer and then setting a/b/c/d to memory locations within that buffer, then that's a different story.

Another option, if you have bounded strings, is to do something like this:
struct {
  char a[16];
  char b[16];
  int e;
} Foo;
You could then make a big array of Foo's. The gotcha with this is that initialization of aggregates is NYI. To get around this, I create one "seed" object, then construct new aggregates by copying the seed (which is JIT-compiled). There is more memory-writing going on here, but at least it is JIT-complied and "works for me".

Example:
local Foo_t = ffi.typeof("struct Foo") -- Foo defined above
local Foo_init = Foo_t() -- This is NYI because of initialization of aggregates, it is zero-initialized

local bar = Foo_t( Foo_init ) -- This can be JIT-complied because it is a struct copy (there are caveats to this too I think)
ffi.copy( bar.a, 'text' )   -- TODO: be aware of buffer overflow
bar.e = 1234

Hope that helps,
Evan



On 04/03/2014 09:13 AM, Jordan Henderson wrote:
Hi,
I am wondering if the overhead of using a lua table (hash lookups etc) are greater than the overhead of converting lua types -> c types. I know I could benchmark this myself, but I thought it might be easier to post here.
For instance, if I was to store 5 strings as such:
local t = {}
t.a = "asdf"
t.b = "asdf"
t.c = "asdf"
t.d = "asdf"
t.e = 1234

versus a fixed ffi struct:
ffi.cdef[[
typedef struct {
const char* a;
const char* b;
const char* c;
const char* d;
int e;
} t;
]]
... (create the struct/set members etc).

Which one, in theory, is the better option assuming no other members are needed? Would there be any significant speed improvements when creating/using a large amount of these struct objects versus tables + lookup overhead (even with the lua type->c type conversion overhead)? In the end, I am hoping to identify the option with the best performance. I could preallocate an array of these structs beforehand to avoid heavy small allocations as well.

One other thing - I am wondering what the lifetime of struct members are in relation to, persay, lua strings. I assume the raw char* set in an ffi struct member points directly to the memory handled by lua's gc, and therefore is only good as long as the original variable is still in scope?

Thanks ahead of time - these questions have been nagging me for a while now.



Other related posts: