Luke Gorrie wrote: > I'm interested in using LuaJIT in a realtime application. In particular I'd > like to use LuaJIT code instead of iptables-style patterns in a network > forwarding engine, and I'd like to be able to establish some upper-bounds > on processing time for my own peace of mind. For example, to be able to be > confident that rules of a certain basic complexity level would never take > more than (say) 50us to execute. > > Is this a realistic notion? With soft-realtime you need to specify more parameters: the worst case latency under 'usual' operating conditions, the tolerable worst case latency under averse conditions and the acceptable probability of that happening. And of course the often forgotten maximum acceptable bandwidth consumed by the memory allocator and the garbage collector. But the real question is: do you want to find these parameters for a specific implementation (with LuaJIT) or do you have strict bounds for these numbers and want to shape the implementation to match them? > I'm guessing that GC is the main issue to be concerned about. As Thomas already said: avoiding allocations is the simplest recipe. But the incremental GC in LuaJIT 2.0 (same as Lua 5.1) is not that bad. It does have some atomic pauses that may be of concern: - Stacks are traversed atomically -- don't create huge stacks (deep recursion). - Each table is traversed atomically -- don't create huge tables (millions of elements). Or consider using FFI structures. - Tables that hit a write barrier will be remarked atomically -- this is usually not an issue, unless they are huge (see above). - The list of userdata objects is traversed atomically -- don't create too many of them. Or consider using FFI cdata. - Userdata and FFI cdata finalizers may be invoked on any GC checkpoint -- don't create long-running finalizer functions. IMHO it's pretty easy to avoid these issues in your code. [The planned new GC for LuaJIT 2.1 will eliminate most of these pauses or try to reduce their impact.] You can reduce the length of each incremental GC step with the "setstepmul" parameter. But note that your throughput will suffer if the value is too low. You really need to measure the GC step duration within your application, since it depends a lot on the mix of objects, cache behavior etc. The builtin allocator is a variant of dlmalloc. I'm sure someone else has already figured out the worst case pauses this might incur. The JIT compiler is mostly incremental, too. The recording phase (which invokes most optimizations on-the-fly) is fully incremental. There are some non-incremental phases, like the LOOP, SPLIT and SINK optimization passes (these are pretty fast) and the backend assembler. They are linearly bounded by the maximum trace size (-Omaxtrace=x). We're talking about a couple of microseconds, so this shouldn't be too much of a concern. I'm leaving out the discussion of OS/cache latencies here, since you need to take care of these, anyway. --Mike