Re: Docs for allocation sinking optimization

  • From: Geoff Leyland <geoff_leyland@xxxxxxxxxxx>
  • To: luajit@xxxxxxxxxxxxx
  • Date: Fri, 13 Jul 2012 13:02:46 +1200

On 12/07/2012, at 11:36 AM, Mike Pall wrote:

> Ok, one one hand the example is synthetic

Yes (I meant stupid when I said it) and at that point my question was more "do 
I understand this right?", than "why don't you do this?", however..

> you usually *do*
> access the values you store in an aggregate. E.g. for
>  local t = {expr1, expr2}
> you'll most certainly access t[1] _and_ t[2] afterwards and
> on-trace. So expr1 and expr2 need to be computed, anyway.

Have you collected or seen evidence for this?  I have none, but I have this gut 
feeling (and vague memories of writing OO fluid-property calculating code years 
ago) that there are situations in OO systems where you end up computing a few 
more things than you need.

But to be clear: I have no actual evidence of a system where such an 
optimization would improve a hot path and my gut feelings don't have a good 
track record.

> The compiler folklore tells us that rematerialization is not
> always a win. Usually only constants (already handled) or really
> simple operations with a maximum of one variable input should be
> rematerialized. I.e. ADDOV(i, 5) would qualify.

I'm afraid I don't know exactly what "rematerialization" means here.  Does it 
mean a) that the value *is* used on the hot trace, but has to be spilled before 
it gets to the trace exit and then it's cheaper to recalculate it from an 
unspilled value than it is to reload it to the stack or b) the value is *not* 
used on the hot trace, and can be completely sunk to the exit?

In a) I can see that it would have to be a fairly cheap calculation (add a 
small integer constant?) to be faster than a load from the stack and in b) 
you're trading avoiding computations on the hot path against possibly 
increasing the number of live values.

> Anyone who wants to learn a bit about the compiler internals is
> welcome to add this feature:
> 
> Look how IR_CONV is handled and adapt that to e.g. ADDOV etc.:
> - lj_asm.c    asm_snap_alloc1()
> - lj_opt_sink.c       sink_checkphi()
> - lj_snap.c   lj_snap_replay()
> - lj_snap.c   snap_restoreval()
> - lj_snap.c   snap_restoredata()
> 
> Constants are already moved to the right, so you can check for:
>  if (irref_isk(ir->op2) && (ir->o == IR_ADDOV || ir->o == ...) &&
>      ir_used(IR(ir->op1)))

A rough patch for just ADDOV is attached:
 - I haven't done anything about overflow checking.  First, I couldn't find 
evidence of how to do it without compiling it, second, I thought I'd check the 
direction was roughly right before I went too far.
 - I think you mean ra_used?  I had to define that in a couple of files.
 - snap_restoredata didn't take J as an argument, but IR needed that.
 - my testing was pathetic, but it did generate the code I wanted from the 
earlier example.  It only exercises the snap_restoreval path.  I'll test more 
as I get time.

If this looks roughly right, or you can point out where it's wrong, I'll keep 
chipping away at it.

Cheers,
Geoff



Other related posts: