LuaJIT-friendly API and data structure design

  • From: Luke Gorrie <luke@xxxxxxxx>
  • To: luajit@xxxxxxxxxxxxx
  • Date: Thu, 19 Feb 2015 14:54:19 +0100

Howdy!

We have been working on Snabb Switch for quite a while now. This has been
the first brush with LuaJIT for many of us. Here is a very brief summary of
our experience to date and a request for some API design advice.

The main reasons we chose LuaJIT originally were:

1. Easy to learn language.
2. Great FFI.
3. High performance potential.

The plan was always to gradually develop a feel for what to write in Lua vs
what to write in C. The "killer feature" of LuaJIT is that the FFI means we
always have C at our fingertips. Our fast-path can be written in Lua, C, or
both, and switching this around is no big deal.

This has worked out well. The FFI is every bit as good as advertised. Lua
code is running fast and we have not needed to rewrite anything in C yet
(!). I am really impressed with LuaJIT.

We do have to work hard for Lua performance though. The trace-based JIT
requires a different programming style and we early adopters have to invest
a lot of brain cycles in getting the hang of this. The hope is that this
will be much easier for people in the future once we have settled on an
idiomatic programming style that always works well.

(Alex Gall is way ahead of me on this track, I am catching up:
https://groups.google.com/forum/#!topic/snabb-devel/AbBpOWNtjAE)

Lately we have been redesigning some of our core APIs and data structures
to make simple and efficient code easier to write. For example, we removed
reference counts and scatter-gather I/O lists for packet data, which both
required our inner loops to call library functions with unpredictable
branching/looping behavior.

Now I am interested in the line between Lua and FFI data structures.
Currently we have our data in FFI and our code in Lua. This mostly works
well. However, one issue is that we often load FFI pointers into Lua
variables and this can cause heap allocation (boxed pointers) if the
allocation sinking optimizations don't take effect. So I am wondering
whether we should redraw the border between Lua/FFI data structures somehow
to avoid that class of problem.

Here is a fairly accurate sketch of the main Snabb Switch data structures:

    struct packet {
      int flags;
      char data[10240];
    };

    struct freelist {
      int nfree;
      struct packet *[10000];
    }

    struct ring {
      int read, write;
      struct packet *packets[256];
    }

These structures are allocated in FFI memory but almost exclusively
accessed with Lua code. For example, to take a packet from a ring:

    function read (ring)
      ring.read = band(ring.read + 1, 0xff)
      return ring.packets[read]
    end

I start to wonder if this is bad programming style: a Lua function that is
returning a new pointer. This will cause heap allocation unless the caller
is careful to make allocation sinking work. That is quite a burden to put
on the caller -- writing code to ensure that allocations sink is not
trivial. So should we start to think of APIs returning new pointers as a
red flag? If so, what should we do
instead?

One idea here would be to replace 'freelist' and 'ring' with Lua tables.
Then we would heap-allocate the 'struct packet *' FFI pointers once and for
all instead of (potentially) each time we access them.

One downside of this would be increasing the size of the Lua heap. Likely
we would need to limit the number of packets we deal with at one time. That
might be acceptable.

One more downside is that this would not be universal. There are other
situations where we need to use pointers less predictably and where this
technique would not apply. For example, suppose we want to access the TCP
header of a packet with code like:

    function tcp_header (packet)
      return ffi.cast(tcp_t, packet.data + tcp_offset(packet.data))
    end

This pointer arithmetic can also lead to allocation, and it would be nice
to avoid that. The approach that comes to mind is to require a reusable box
(struct tcp *[1]) to be provided:

    function tcp_header (packet, box)
      box[0] = ffi.cast(tcp_t, packet.data + tcp_offset(packet.data))
    end

and maybe this is actually the basis for a general solution. I don't enjoy
this specific "[0]" syntax so much but perhaps we could wrap it up in a Lua
object nicely.

(Again, Alex is already ahead of me on this track, and I am catching up
from first principles.)

Mike and others, what words of wisdom do you have for us? :-)

Cheers,
-Luke

Other related posts: