Hello, I was recently messing around with LuaJIT benchmarking and used the Scimark test as one of the benchmarks, only to find out that the performance for most tests is fine (within 20% slower compared to gcc -O2, which is very nice) with one test being considerably slower, the sparse matmul test. So I decided to create a very minimal implementation of the test and ended up with http://codepad.org/hEJOhpAJ - the resulting performance is the same as the "regular" LuaJIT scimark rewrite by Mike. I deided to replace the inner loop with a very equivalent C function called through ffi - simply put this http://codepad.org/gmjl15by into a file called loopc.c and compiled with gcc loopc.c -o loopc.so -O2 -fPIC -shared. As you can see the code is direct rewrite of the Lua inner loop. The performance is however vastly different - getting 1800 MFLOPS rather than 400 something (just like native SciMark). Can anyone explain this behavior? It's the same on LuaJIT 2.0.3 and LuaJIT 2.1-alpha. Thanks, Daniel