Luke Gorrie wrote: > I suppose that a really safe and future proof way to load/store volatile > values would be with a little C library: > > int peek(int *ptr) { return *ptr; } > void poke(int *ptr, int value) { *ptr = value; } > > and then call that via FFI? That's overkill. A regular C call is 'safe' in the sense that its semantics ('may have any side-effect on memory') will never change. In the future, there might be some types of C functions that give the compiler more freedom for cross-call optimizations. But then their declarations would have to be excplicitly tagged with attributes. > Can loads be forwarded between loop iterations? That is: could load > forwarding create an infinite loop if I am polling for a new value in an > FFI pointer (while ptr[0] == nil do end)? Sure, it will. Just try it. > > That said, I'm willing to add an ffi.barrier() instruction that > > would give you finer control and more efficiency in tight polling > > loops. With an argument that can either be "l", "s", "m" (compiler > > barrier) or "L", "S", "M" (hardware barrier). The latter would > > imply the former, of course. Patches welcome! > > Could you sketch how this might be implemented? (I enjoy diving into the > LuaJIT code to follow some thread or other -- I hope that by making a habit > of this I will gradually learn how things work.) - Create a new ffi.barrier() builtin for the interpreter. That one could just check for uppercase and always emit the M barrier. No compiler barriers necessary in the interpreter. Probably easiest to make this an assembler builtin and handle this in the *dasc files for each architecture. - Change XBAR to take a literal argument. This should receive the code of the first character in the option string. - Mark the builtin as recordable. Add the recording routine to the JIT compiler and emit the proper XBAR. Treat any unknown barrier like an m or M barrier (assuming this is the strongest barrier we ever want to support). - Optionally find all the places in the compiler frontend that check for XBAR crossings and relax handling for l/L and s/S XBARs. - Modify all CPU-specific JIT-compiler backends to emit the correct hardware barriers for each XBAR. Nothing to do for pure compiler barriers. That said, maybe one should introduce a more general builtin (not sure how to name this), that allows a wider range of interesting optimizations. E.g. telling the compiler that the result of a load is definitely constant. Or meta-compiler features, such as forcing value specialization (useful when writing interpreters in Lua). But this shouldn't be in the ffi namespace then. > It would be interesting to understand how these kind of "intrinsics" > features could be added to LuaJIT though. At the moment only by adding new builtins, new IR instructions and modifying all backends. I've previously (*) mentioned the idea of user-definable intrinsics for the FFI. The main use case would be vector instructions (to avoid hardcoding hundreds of instructions). But the same approach would allow inlining an arbitrary sequence of machine instructions, too. (*) http://lua-users.org/lists/lua-l/2011-05/msg00219.html > There is another related use case: a cache prefetch instruction. I can > imagine this could be beneficial if it could be made to simply emit one > PREFETCHNTA instruction but is perhaps more likely to be detrimental if you > trigger an optimisation barrier for calling it via an FFI function. Yes, this would be a classic case for an inlined machine instruction. Similarly, low-level instructions for lock-free algorithms (e.g. compare+swap) might be useful. --Mike