Daniel Kolesa wrote: > As you can see the code is direct rewrite of the Lua inner loop. > The performance is however vastly different - getting 1800 MFLOPS > rather than 400 something (just like native SciMark). The code generated for the innermost loop is more or less optimal. But for the chosen parameters, it only runs for 5 iterations. Then the overhead of the trace that runs around it and forms the outer loop dominates. This trace is not very efficient, since it has to check lots of assertions, which cannot be hoisted. This is difficult to improve, since the trace compiler only has local knowledge and can't infer much about that trace. --Mike