Re: Apparent optimiser bug breaks compiled code (2.1.0-alpha)

  • From: Alexander Gall <alexander.gall@xxxxxxxxx>
  • To: luajit@xxxxxxxxxxxxx
  • Date: Thu, 9 Oct 2014 11:41:46 +0200

On Wed, Oct 8, 2014 at 10:14 PM, Mike Pall <mike-1410@xxxxxxxxxx> wrote:
> Alexander Gall wrote:
>> In the current state, the calculation of the hash of the input
>> data starts to fail when the code is optimizied, but strangely enough
>> only if the profiler is enabled with the 'l' option as well:
>
> I've found a problem with fused loads of constants under high
> register pressure.
>
> Thank you for the test case! Fixed in the git repository.

Thanks for the quick fix.

>
>> A remark about the Bloom filter code: it contains loops with a low
>> number of iterations (4 in this example), which is fixed and known
>> when the filter is created. The code uses an automated "loop unroller"
>> to, well, unroll these loops. This hack has sped up processing
>> considerably (avoidance of trace aborts due to loop unrolling by the
>> compiler and improved optimizaton to eliminate GC overhead). I'd be
>> interested to learn whether that's actually a good thing to do or has
>> any drawbacks (like causing the effect I'm seeing here ;)
>
> Well ... high register pressure, lots of spills etc.

Understood.  I've actually encountered trace aborts due to exhausted spill
slots when I was playing around with this kind of thing.

>
> But the code isn't optimal, anyway. Too many abstractions, lots of
> tiny details (e.g. that multiply by 16 goes via FP). You should
> always have a close look at the generated IR and assembler code,
> if you're doing performance tuning at that level.

I certainly haven't quite developed an eye for these tiny details yet
(and this is my first project in Lua as well). All of your hints are highly
appreciated.

In fact, the main concern I currently have for my application is GC overhead.
I need to forward packets up to a rate of 10Gbps (or several 100K or a few
M packets per second) and I fear that GC can cause undesireable jitter and
packet drops. This is backed more by a gut feeling than hard evidence, though.
I'd be interested in your opinion on this.

I found that I can maintain a fairly high level of abstraction while
still having
enough overall performance. I suppose this coding style tends to generate
more garbage, which makes it more dependent on optimizations by the compiler.
So far, that has worked quite well, but I'm prepared to change things at a
fundamental level as I learn more about the system.

-- 
Alex

Other related posts: