Re: Luajit struct vs table performance.

From: Evan Wies <evan@xxxxxxxxxxxxx>
To: luajit@xxxxxxxxxxxxx
Date: Thu, 03 Apr 2014 11:30:16 -0400

Earlier this year, I converted a particular program from using aLua-table-of-tables to a FFI-based HashMap of FFI structs. I actuallydid it because of memory limitations, not for performance gains. Therunning time for the job was reduced by about 30%, shaving about 3minutes off a 10 minute run. But the best part was that I could thenrun my program on a full data set, rather than sharding it.

I did not dive into why there was this improvement and am posting thisanecdote just in response to your query. It was not a micro-benchmark,but a "real-world script" and I was not rigorous about measurement (justran 'time' on each run). However, in my specific case, I relieved a lotof strain from the GC because I started "managing" the objects -- goingfrom hundreds of thousands of GC objects to <1000. So I'm sure thatcontributed to the improvement. Compared to your case, my structs werefilled with numbers (mostly doubles and uint64_t), so there wasn't somuch conversion going on.

Your case is much different than mine because you want pointers tostrings. You are correct -- you will definitley need to be anchoringyour cdata string values, keeping them alive longer than the pointermembers. If you are going to constantly create cdata strings and copyfrom Lua strings to these C-strings then stash those string pointers inyour structs, then the whole thing will be a lot more complicated andprobably slower. If you are actually creating one big characterbuffer and then setting a/b/c/d to memory locations within that buffer,then that's a different story.


Another option, if you have bounded strings, is to do something like this:
struct {
  char a[16];
  char b[16];
  int e;
} Foo;

You could then make a big array of Foo's. The gotcha with this is thatinitialization of aggregates is NYI. To get around this, I create one"seed" object, then construct new aggregates by copying the seed (whichis JIT-compiled). There is more memory-writing going on here, but atleast it is JIT-complied and "works for me".


Example:
local Foo_t = ffi.typeof("struct Foo") -- Foo defined above

local Foo_init = Foo_t() -- This is NYI because of initialization ofaggregates, it is zero-initialized

local bar = Foo_t( Foo_init ) -- This can be JIT-complied because itis a struct copy (there are caveats to this too I think)

ffi.copy( bar.a, 'text' )   -- TODO: be aware of buffer overflow
bar.e = 1234

Hope that helps,
Evan



On 04/03/2014 09:13 AM, Jordan Henderson wrote:

Hi,
I am wondering if the overhead of using a lua table (hash lookups etc)are greater than the overhead of converting lua types -> c types.I know I could benchmark this myself, but I thought it might be easierto post here.
For instance, if I was to store 5 strings as such:
local t = {}
t.a = "asdf"
t.b = "asdf"
t.c = "asdf"
t.d = "asdf"
t.e = 1234

versus a fixed ffi struct:
ffi.cdef[[
typedef struct {
const char* a;
const char* b;
const char* c;
const char* d;
int e;
} t;
]]
... (create the struct/set members etc).
Which one, in theory, is the better option assuming no other membersare needed? Would there be any significant speed improvements whencreating/using a large amount of these struct objects versus tables +lookup overhead (even with the lua type->c type conversion overhead)?In the end, I am hoping to identify the option with the bestperformance. I could preallocate an array of these structs beforehandto avoid heavy small allocations as well.
One other thing - I am wondering what the lifetime of struct membersare in relation to, persay, lua strings. I assume the raw char* set inan ffi struct member points directly to the memory handled by lua'sgc, and therefore is only good as long as the original variable isstill in scope?
Thanks ahead of time - these questions have been nagging me for awhile now.

References:
- Luajit struct vs table performance.
  - From: Jordan Henderson

Re: Luajit struct vs table performance.

Other related posts: