Re: LuaJIT-friendly API and data structure design

  • From: Luke Gorrie <luke@xxxxxxxx>
  • To: luajit@xxxxxxxxxxxxx
  • Date: Sun, 22 Feb 2015 10:35:18 +0100

Thanks for the great information, Mike!

On 21 February 2015 at 22:14, Mike Pall <mike-1502@xxxxxxxxxx> wrote:

> Well, the FFI store to the box is certainly better.
>
> But then ... the hoistable overhead for FFI types is a bit higher
> (type checks etc.). If the iteration count is low or the code is
> branchy, the manual FFI boxing approach might be slower. You'll
> have to test it.
>

Generally our inner loops will average ~50 iterations and look like:

while packet.receive(link, p) do work(p) end

and there will be many different work() functions written by different
people with different backgrounds.

I would like to have an idiomatic programming style for writing work()
functions that is easy to understand and tends to have reasonable
performance. This is so that people who are new to Lua can be satisfied
with the performance of their initial programs. It's okay to require an
understanding of trace compilers to achieve peak performance, but it should
be easy to achieve reasonable/adequate performance without this.

I'd say that we can reduce the number of branches and loops inside work()
functions by having common library functions that don't branch and loop.
This may be an interesting programming exercise and a good opportunity to
sharpen our LuaJIT optimization skills.

> How about the relative efficiency of the two API styles,
> 'packet.length(p)'
> > vs 'p:length()'?
>
> It it ends up in an inner loop, the overhead for both can be
> eliminated. But, if not ...
>

Curious: How do we define "inner loop" for practical purposes?

For example, which uses of 'p' would be considered inner loops here? (one,
both, or unknown? suppose that x() and y() are not branchy.)

p = ...
for i = 0, 100 do
  x(p)
  for j = 0, 100 do
    y(p)
  end
end

For the functional style, manual caching of the function in an
> immutable upvalue is the fastest way, e.g.:
>   local plength = packet.length
> And, yes, for this to work, you'll have to write your code in the
> correct top-down declaration order. :-p
>

:-)

Top-down local declarations within modules is definitely a style that we
could adopt. I have not seen the benefit with my own eyes yet.

For manual caching I am concerned about tiny decision overload and
boilerplate. Tiny decision overload as in constantly wondering "is foo.bar
worth caching in this module?" and boilerplate in the sense of copy/pasting
a bunch of lines like these:

local bit_band, bit_bor, bit_lshift, bit_rshift = bit.band, bit.bor,
bit.lshift, bit.rshift
local packet_length, packet_data, packet_free = packet.length, packet.data,
packet.free
local link_empty, link_full, link_receive, link_transmit = link.empty,
link.full, link.receive, link.transmit

It sounds like FFI methods would give us the benefit of local caching
without the syntactic overhead? That sounds tempting..

Other related posts: