That'd explain it... thanks! 2014-03-18 17:41 GMT+00:00 Mike Pall <mike-1403@xxxxxxxxxx>: > Daniel Kolesa wrote: > > As you can see the code is direct rewrite of the Lua inner loop. > > The performance is however vastly different - getting 1800 MFLOPS > > rather than 400 something (just like native SciMark). > > The code generated for the innermost loop is more or less optimal. > But for the chosen parameters, it only runs for 5 iterations. Then > the overhead of the trace that runs around it and forms the outer > loop dominates. This trace is not very efficient, since it has to > check lots of assertions, which cannot be hoisted. This is difficult > to improve, since the trace compiler only has local knowledge and > can't infer much about that trace. > > --Mike > >