Markus Walther wrote: > On Thu, Sep 20, 2012 at 1:35 AM, Geoff Leyland > <geoff_leyland@xxxxxxxxxxx> wrote: > > https://github.com/vection/LuaFFT would appear to be just that. > > That port is pure Lua... It's likely that FFI would help quite a bit > to make it faster. Probably. But the code is inefficient in many ways. > Still, according to FFTW benchmarks, Ooura's FFT is quite a bit faster > than kissfft, so might be a better base for a LuaJIT port: Maybe or maybe not. It looks rather branchy. The following things are needed to get the best speed out for numerical computations with LuaJIT (in order of importance): * Reduce number of unbiased/unpredictable branches. - Heavily biased branches (>95% in one direction) are fine. - Prefer branch-free algorithms. - Use math.min() and math.max(). - Use bit.*, e.g. for conditional index computations. * Use FFI data structures. - Use int32_t, avoid uint32_t data types. - Use double, avoid float data types. - Metamethods are fine, but don't overdose them. * Call C functions only via the FFI. - Avoid calling trivial functions, better rewrite them in Lua. - Avoid callbacks -- use pull-style APIs (read/write) and iterators instead. * Use plain 'for i=start,stop,step do ... end' loops. - Prefer plain array indexing, e.g. 'a[i+2]'. - Avoid pointer arithmetic. * Find the right balance for unrolling. - Avoid inner loops with low iteration count (< 10). - Only unroll loops if the loop body has not too many instructions. - Consider using templates instead of hand-unrolling (see GSL Shell). - You may have to experiment a bit. * Define and call only 'local' (!) functions within a module. * Cache often-used functions from other modules in upvalues. - E.g. local sin = math.sin ... local function foo() return 2*sin(x) end - Don't do this for FFI C functions, cache the namespace instead, e.g. local lib = ffi.load("lib"). * Avoid inventing your own dispatch mechanisms. - Prefer to use built-in mechanisms, e.g. metamethods. * Do not try to second-guess the JIT compiler. - It's perfectly ok to write 'z = x[a+b] + y[a+b]'. - Do not try CSE by hand, e.g. 'local c = a+b'. - It's perfectly ok to write 'a[i][j] = a[i][j] * a[i][j+1]'. - Do not try to cache partial FFI struct/array references (e.g. a[i]) unless they are long-lived (e.g. in a big loop). * Be careful with aliasing, esp. when using multiple arrays. - LuaJIT uses strict type-based disambiguation, but there are limits to this due to C99 conformance. - E.g. in 'x[i] = a[i] + c[i]; y[i] = a[i] + d[i]' the load of a[i] needs to be done twice, because x could alias a. It *does* make sense to use 'do local t = a[i] ... end' here. * Reduce the number of live temporary variables. - Best to initialize on definition, e.g. 'local y = f(x)' - Yes, this means you should interleave this with other code. - Do not hoist variable definitions to the start of a function -- Lua is not JavaScript nor K&R C. - Use 'do local y = f(x) ... end' to bound variable lifetimes. * Do not intersperse expensive or uncompiled operations. - print() is not compiled, use io.write(). - E.g. avoid assert(type(x) == "number", "x is a "..mytype(x)") - The problem is not the assert() or the condition (basically free). The problem is the string concatenation, which has to be executed every time, even if the assertion never fails! - Watch the output of -jv and -jdump. You need to take all of these factors into account before deciding on a certain algorithm. An advanced algorithm, that's fast in theory, may be slower than a simpler algorithm, if the simpler algorithm has much fewer unbiased branches. --Mike