Re: m = mat4(1, 2, 3, …) 75x slower than m = mat4(); m.m11 = 1; … (was: Allocation sinking in git HEAD)

  • From: Adam Strzelecki <ono@xxxxxxx>
  • To: luajit@xxxxxxxxxxxxx
  • Date: Thu, 5 Jul 2012 00:32:43 +0200

> (...) I've bumped some limits and removed some other restrictions. Both
> variants should now run much faster and at the same speed. Please
> try again with git HEAD.

Yes it works like a charm now, thanks Mike!

> Note that in general the cdata initializer variant mat4(a, b, ...)
> is preferred, because it's slightly easier to optimize for the
> compiler than an allocation + explicit stores.

Yeah, now I can see that when using doubles instead of floats cdata initializer 
is slightly quicker than alloc + explicit store.

> Also, please note that using 'float' does not improve anything
> (...) All FP computations are done with 'double', so using 'float' for
> storage causes lots of float<->double conversions. These can
> become costly in some contexts. E.g. your example runs around 20%
> faster, if one uses doubles instead of floats for the matrix.

You are absolutely right. It just realised looking again on -jdump that I got 
addsd & mulsd (scalar double SSE instr) and lot of cvtsd2ss. Still I wish I 
could find (and I believe I will) vmulpd %ymm... there :) Probably thanks for 
future sponsorship.

Thanks for instant update,

So far it is the best update to LuaJIT ever.

All best,
-- 
Adam Strzelecki | nanoant.com | twitter.com/nanoant


Other related posts: