Re: Strange performance behaviour

  • From: Mike Pall <mike-1208@xxxxxxxxxx>
  • To: luajit@xxxxxxxxxxxxx
  • Date: Sun, 26 Aug 2012 00:26:15 +0200

Henk Boom wrote:
> I'm not sure if this is something to do with luajit or just something
> about memory access pattern efficiency that I don't know about, but
> either way knowing what's making it happen would help me to find out
> what to avoid :)

This doesn't have much to do with the compiler. You'd observe the
same effect on any modern CPU with memory access disambiguation.

The compiler cannot disambiguate buffer[0] from buffer[n], since n
is not a constant (and specializing to a numeric value is usually
not a win). Ok, so the load of buffer[0] can be forwarded from the
preceding store to buffer[0]. But the load of buffer[n] can
neither be hoisted nor forwarded. This in turn means none of the
stores to buffer[0] can be eliminated, either.

Actually LuaJIT is in no better position than a C compiler and
I bet the code and the resulting behavior is pretty much the same.


Modern CPUs can do better, because they know the actual addresses
involved and their memory disambiguation logic speculates that
buffer[n] does (n = 0) or does not (n = 1) alias buffer[0].

The non-aliasing case (n = 1) is simpler to handle for the CPU,
because buffer[n] is effectively read-only and the load (and the
multiply) can be executed ahead or in parallel to everything else.

For the aliasing case (n = 0), the preceding store to buffer[0]
must be forwarded to the load for buffer[n]. This incurs a
store-to-load forwarding delay, which increases the depth of the
data dependency chain by this delay plus the multiply latency for
each step.

tl;dr: The performance difference you're seeing is a result of
billions of dollars of engineering that went into the CPU next to
your knee.

Related reading:

Jon Stokes
Inside the Machine: An Illustrated Introduction to Microprocessors
and Computer Architecture

[It's a bit pricey and a bit dated (only up to the Core2). Or get
the articles this book is based on from arstechnica.com.]

--Mike

Other related posts: