Re: small script to reproduce bogus trace stitch errors at line 0 with coroutines in latest 2.1

  • From: "Claire Lewis" <claire_lewis@xxxxxxxxxxx>
  • To: <luajit@xxxxxxxxxxxxx>
  • Date: Thu, 30 Apr 2015 08:35:43 +0930

Hi Mike,

Firstly, good news on finding that bug (and also thanks to Elias!)

Just thought I'd add a few random thoughts, apologies if this is completely off-base or useless.


When a trace has to stop at a NYI function, it compiles an exit to
the interpreter with a continuation underneath of it in the stack.
When the function later returns, this continuation either triggers
recording of a new trace that continues the control flow or it
directly jumps to that trace, if already compiled.

* What if it were limited to one such active continuation at a time? The state (interpreter, not coroutine) could remember where that continuation is for any future trace flush, then invalidate it then and there.

This is a little kludgy, but maybe an easy solution for the current (already-kludgy?) stitching pending a rethink, and may (?) reduce overall performance in some cases - but then wouldn't those cases be handled by a subsequent pass (assuming said code doesn't get blacklisted for too many compilation attempts, in which case the code probably could use rethinking anyway..)

Finding and cleaning up all of these continuations with stale data
would require a full GC scan, which is unfeasible. Invalidating
the data is tricky. Maybe some kind of generation number, which is
incremented after each flush (but it might wrap). Or some
completely different idea ...

* Looking briefly at some structures, it appears the trace number is stored 16 bit - and perhaps it and the generation number would need to be able to to stored in 2x16bit in the op fields? Though I'm not sure.

If the limit would be 16+16, that wrap may be a real concern. Perhaps rather than a generation number, one could use a 32 bit trace number with a "last flush trace head" value remembered each flush (i.e. the oldest ID that will still be valid for continuation, by recording what the next traceid would be at flush, and not resetting it). This extends that wrap point to 4bn traces rather than 64k generations. There may be some penalty in having to subtract off the head value when indexing into things, though I'm, not sure that traces are ever stored in an array.


* Is a GC scan unfeasible in that it can't be done, or that it would be a prohibitive performance penalty?

If it is the latter, what about using a generation sequence, then if that ever wraps, do a full GC scan? It would of course cause a pause tat that point, but no more so than repeated trace fail could.


* Lastly, does it even need to store a raw trace ID for continuation at all? Could it not store some monotonic incrementing continuation ID that could be looked up in a list which would detail corresponding trace id for continuation, and any other details needed? Adds a level of indirection, but I doubt evaluating a continuation is a common or particularly performance sensitive thing - once it compiles, it all goes away I assume.

This would then allow invalidation of the continuation list by the trace flush, attempting to close a continuation will simply fail to find it on the list (or find it marked deleted, whatever seems best), clean up and carry on.

Again there is a wrap chance here, but assuming the continuation can be 32 bit, having 4bn continuations is getting pretty pathological. If it hits that, maybe just turn off trace stitching for this state and go home :)


Anyway, just some random thoughts, hope some of it might prove useful.

Thanks,
- Claire


Other related posts: