Re: Compiler load/store barrier; volatile pointer; barriers in general

  • From: Mike Pall <mike-1501@xxxxxxxxxx>
  • To: luajit@xxxxxxxxxxxxx
  • Date: Thu, 29 Jan 2015 11:34:43 +0100

Luke Gorrie wrote:
> OK, how about this then for a simple solution today for use outside of
> inner loops?
> 
> void compiler_barrier() {}
> void cpu_barrier() { __sync_synchronize(); } // MFENCE on x86
> 
> and we expect that the JIT will not forward loads or stores across either
> barrier.

That ought to work.

> I'm not sure if this is related but I have also had fantasies about being
> able to translate constant-yielding expressions into actual constants in
> the recording step. Like a "memoize this call site" primitive. I am not
> sure how widely applicable this would be though, or how prone to misuse.

IMHO the main problem is that any misuse won't be detected, unless

1. the compiler actually takes advantage of the user-provided information
2. and that piece of information is wrong
3. and some computation actually returns a wrong result due to that.

That'll be tough to debug ...

> > I've previously (*) mentioned the idea of user-definable
> > intrinsics for the FFI.
> 
> That looks like an awesome feature on the face of it. I wonder in how many
> cases it would actually be preferable to calling a C function that uses
> GCC's existing intrinsics. Just when the overhead of a CALL and compiler
> barrier is excessive?

E.g. calls for vector instructions wouldn't be that useful, only
intrinsics would work. Have a look at the x86 or x64 calling
conventions: the compiler has to spill and restore all vector
registers to/from the stack around each C call.

OTOH the overhead of (say) a locking instruction is considerable,
even in the uncontended case. An intrinsic probably would't make
much of a performance difference.

--Mike

Other related posts: