Thanks for the great information, Mike! On 21 February 2015 at 22:14, Mike Pall <mike-1502@xxxxxxxxxx> wrote: > Well, the FFI store to the box is certainly better. > > But then ... the hoistable overhead for FFI types is a bit higher > (type checks etc.). If the iteration count is low or the code is > branchy, the manual FFI boxing approach might be slower. You'll > have to test it. > Generally our inner loops will average ~50 iterations and look like: while packet.receive(link, p) do work(p) end and there will be many different work() functions written by different people with different backgrounds. I would like to have an idiomatic programming style for writing work() functions that is easy to understand and tends to have reasonable performance. This is so that people who are new to Lua can be satisfied with the performance of their initial programs. It's okay to require an understanding of trace compilers to achieve peak performance, but it should be easy to achieve reasonable/adequate performance without this. I'd say that we can reduce the number of branches and loops inside work() functions by having common library functions that don't branch and loop. This may be an interesting programming exercise and a good opportunity to sharpen our LuaJIT optimization skills. > How about the relative efficiency of the two API styles, > 'packet.length(p)' > > vs 'p:length()'? > > It it ends up in an inner loop, the overhead for both can be > eliminated. But, if not ... > Curious: How do we define "inner loop" for practical purposes? For example, which uses of 'p' would be considered inner loops here? (one, both, or unknown? suppose that x() and y() are not branchy.) p = ... for i = 0, 100 do x(p) for j = 0, 100 do y(p) end end For the functional style, manual caching of the function in an > immutable upvalue is the fastest way, e.g.: > local plength = packet.length > And, yes, for this to work, you'll have to write your code in the > correct top-down declaration order. :-p > :-) Top-down local declarations within modules is definitely a style that we could adopt. I have not seen the benefit with my own eyes yet. For manual caching I am concerned about tiny decision overload and boilerplate. Tiny decision overload as in constantly wondering "is foo.bar worth caching in this module?" and boilerplate in the sense of copy/pasting a bunch of lines like these: local bit_band, bit_bor, bit_lshift, bit_rshift = bit.band, bit.bor, bit.lshift, bit.rshift local packet_length, packet_data, packet_free = packet.length, packet.data, packet.free local link_empty, link_full, link_receive, link_transmit = link.empty, link.full, link.receive, link.transmit It sounds like FFI methods would give us the benefit of local caching without the syntactic overhead? That sounds tempting..