Re: Compiler load/store barrier; volatile pointer; barriers in general

  • From: Mike Pall <mike-1501@xxxxxxxxxx>
  • To: luajit@xxxxxxxxxxxxx
  • Date: Wed, 28 Jan 2015 19:12:17 +0100

Luke Gorrie wrote:
> I suppose that a really safe and future proof way to load/store volatile
> values would be with a little C library:
> 
> int peek(int *ptr) { return *ptr; }
> void poke(int *ptr, int value) { *ptr = value; }
> 
> and then call that via FFI?

That's overkill. A regular C call is 'safe' in the sense that its
semantics ('may have any side-effect on memory') will never change.

In the future, there might be some types of C functions that give
the compiler more freedom for cross-call optimizations. But then
their declarations would have to be excplicitly tagged with
attributes.

> Can loads be forwarded between loop iterations? That is: could load
> forwarding create an infinite loop if I am polling for a new value in an
> FFI pointer (while ptr[0] == nil do end)?

Sure, it will. Just try it.

> > That said, I'm willing to add an ffi.barrier() instruction that
> > would give you finer control and more efficiency in tight polling
> > loops. With an argument that can either be "l", "s", "m" (compiler
> > barrier) or "L", "S", "M" (hardware barrier). The latter would
> > imply the former, of course. Patches welcome!
> 
> Could you sketch how this might be implemented? (I enjoy diving into the
> LuaJIT code to follow some thread or other -- I hope that by making a habit
> of this I will gradually learn how things work.)

- Create a new ffi.barrier() builtin for the interpreter. That one
  could just check for uppercase and always emit the M barrier. No
  compiler barriers necessary in the interpreter. Probably easiest
  to make this an assembler builtin and handle this in the *dasc
  files for each architecture.

- Change XBAR to take a literal argument. This should receive the
  code of the first character in the option string.

- Mark the builtin as recordable. Add the recording routine to the
  JIT compiler and emit the proper XBAR. Treat any unknown barrier
  like an m or M barrier (assuming this is the strongest barrier
  we ever want to support).

- Optionally find all the places in the compiler frontend that
  check for XBAR crossings and relax handling for l/L and s/S XBARs.

- Modify all CPU-specific JIT-compiler backends to emit the
  correct hardware barriers for each XBAR. Nothing to do for pure
  compiler barriers.

That said, maybe one should introduce a more general builtin (not
sure how to name this), that allows a wider range of interesting
optimizations. E.g. telling the compiler that the result of a load
is definitely constant.

Or meta-compiler features, such as forcing value specialization
(useful when writing interpreters in Lua). But this shouldn't be
in the ffi namespace then.

> It would be interesting to understand how these kind of "intrinsics"
> features could be added to LuaJIT though.

At the moment only by adding new builtins, new IR instructions and
modifying all backends.

I've previously (*) mentioned the idea of user-definable
intrinsics for the FFI. The main use case would be vector
instructions (to avoid hardcoding hundreds of instructions). But
the same approach would allow inlining an arbitrary sequence of
machine instructions, too.

(*) http://lua-users.org/lists/lua-l/2011-05/msg00219.html

> There is another related use case: a cache prefetch instruction. I can
> imagine this could be beneficial if it could be made to simply emit one
> PREFETCHNTA instruction but is perhaps more likely to be detrimental if you
> trigger an optimisation barrier for calling it via an FFI function.

Yes, this would be a classic case for an inlined machine
instruction. Similarly, low-level instructions for lock-free
algorithms (e.g. compare+swap) might be useful.

--Mike

Other related posts: