Re: strange performance problem

  • From: Laurent Deniau <Laurent.Deniau@xxxxxxx>
  • To: LuaJIT <luajit@xxxxxxxxxxxxx>
  • Date: Wed, 1 Jul 2015 19:39:13 +0000

Hi,

I have made a standalone example that do the same thing and exhibit the same
NYI problem:

time luajit -jv complex_inline.lua 1000000
[ NYI … ]
a= 0.70710677618655+0.70710677618655i b= 0.70710677618655+0.70710677618655i

real 0m1.647s
user 0m1.633s
sys 0m0.007s

I would expect a x50 speed up if the JIT did not hit the NYI. Does it mean that
for good performance one needs to rewrite the cxxx functions of the libm in
pure Lua (still an option). Is gsl-shell or Scilua facing the same performance
problem?

Laurent.

-----------------------
local ffi = require 'ffi'

-- Complex

local M= {}
local complex

function M.__unm (x)
return complex(-x.re, -x.im)
end

function M.__add (x, y)
x, y = complex(x), complex(y)
return complex(x.re + y.re, x.im + y.im)
end

function M.__sub (x, y)
x, y = complex(x), complex(y)
return complex(x.re - y.re, x.im - y.im)
end

function M.__mul (x, y)
x, y = complex(x), complex(y)
return complex(x.re*y.re - x.im*y.im, x.re*y.im + x.im*y.re)
end

function M.__div (x, y)
local r, d
x, y = complex(x), complex(y)
if math.abs(y.re) < math.abs(y.im) then
r = y.re / y.im
d = y.re * r + y.im
return complex((x.re * r + x.im) / d, (y.im * r - y.re) / d)
else
r = y.im / y.re
d = y.im * r + y.re
return complex((x.im * r + x.re) / d, (x.im - x.re * r) / d)
end
end

ffi.cdef "complex csqrt(complex);"

M.sqrt = ffi.C.csqrt
M.__index = M

complex = ffi.metatype('complex', M)

-- Example

local a, b = (1+1i)/(math.sqrt(2)+1e-8), 1
local n = arg[1] and tonumber(arg[1]) or 1e8

for i=1,n do
b = (b * a):sqrt()
end

print('a=', a, 'b=', b)
-----------------------

On Jul 1, 2015, at 6:48 PM, Laurent Deniau
<laurent.deniau@xxxxxxx<mailto:laurent.deniau@xxxxxxx>> wrote:

Hi,

I am facing a strange performance problem with the JIT…

My complex (hand written) multiplication is in line with the C version and very
fast (3e8 multiplications per second), so far so good. But when I come to call
a ffi.C function with complex number as argument, for example ffi.C.csqrt, then
luajit reports:

luajit -jv complex.lua 1000
[TRACE --- complex.lua:11 -- NYI: unsupported C function type at complex.lua:12]
[TRACE --- complex.lua:11 -- NYI: unsupported C function type at complex.lua:12]
[TRACE --- complex.lua:95 -- NYI: return to lower frame at complex.lua:12]
[TRACE --- complex.lua:11 -- NYI: unsupported C function type at complex.lua:12]
[TRACE --- complex.lua:95 -- NYI: return to lower frame at complex.lua:12]
[TRACE --- complex.lua:11 -- NYI: unsupported C function type at complex.lua:12]
[TRACE --- complex.lua:95 -- NYI: return to lower frame at complex.lua:12]
[TRACE --- complex.lua:11 -- NYI: unsupported C function type at complex.lua:12]
[TRACE --- complex.lua:95 -- NYI: return to lower frame at complex.lua:12]

I tried different signatures like
complex csqrt(complex);
double complex csqrt(double complex);
double _Complex csqrt(double _Complex);
but nothing change.

For 1e8 iterations of b = csqrt(b*a) where a and b are stable non-zero and not
NaN complex numbers, the timing are the following:

- C version is around 3.5 sec
- luajit version is around 3 minutes
- my own RPN interpreter is around 2.7 sec

I must do something that luajit dislikes, but even the interpreter should be
faster, right?

The code of the complex.lua module (150 lines) can be found there, but it
mimics the practice found on the net
https://github.com/MethodicalAcceleratorDesign/MAD/blob/master/lua/complex.lua
The test is there:
https://github.com/MethodicalAcceleratorDesign/MAD/blob/master/lua/tests/complex.lua

I am a bit stuck, so any recommendation to recover a normal speed is more than
welcome. Tracing JIT are strange beast…

Best,
Laurent.

--
Laurent Deniau http://cern.ch/mad
Accelerators Beam Physics mad@xxxxxxx<mailto:mad@xxxxxxx>
CERN, CH-1211 Geneva 23 Tel: +41 (0) 22 767 4647


Other related posts: