On Thu, Sep 20, 2012 at 5:28 AM, Mike Pall <mike-1209@xxxxxxxxxx> wrote: > Markus Walther wrote: >> On Thu, Sep 20, 2012 at 1:35 AM, Geoff Leyland >> <geoff_leyland@xxxxxxxxxxx> wrote: >> > https://github.com/vection/LuaFFT would appear to be just that. >> >> That port is pure Lua... It's likely that FFI would help quite a bit >> to make it faster. > > Probably. But the code is inefficient in many ways. > >> Still, according to FFTW benchmarks, Ooura's FFT is quite a bit faster >> than kissfft, so might be a better base for a LuaJIT port: > > Maybe or maybe not. It looks rather branchy. > > The following things are needed to get the best speed out for > numerical computations with LuaJIT (in order of importance): > > * Reduce number of unbiased/unpredictable branches. > - Heavily biased branches (>95% in one direction) are fine. > - Prefer branch-free algorithms. > - Use math.min() and math.max(). > - Use bit.*, e.g. for conditional index computations. > > * Use FFI data structures. > - Use int32_t, avoid uint32_t data types. > - Use double, avoid float data types. > - Metamethods are fine, but don't overdose them. > > * Call C functions only via the FFI. > - Avoid calling trivial functions, better rewrite them in Lua. > - Avoid callbacks -- use pull-style APIs (read/write) and > iterators instead. > > * Use plain 'for i=start,stop,step do ... end' loops. > - Prefer plain array indexing, e.g. 'a[i+2]'. > - Avoid pointer arithmetic. > > * Find the right balance for unrolling. > - Avoid inner loops with low iteration count (< 10). > - Only unroll loops if the loop body has not too many instructions. > - Consider using templates instead of hand-unrolling (see GSL Shell). > - You may have to experiment a bit. > > * Define and call only 'local' (!) functions within a module. > > * Cache often-used functions from other modules in upvalues. > - E.g. local sin = math.sin ... local function foo() return 2*sin(x) end > - Don't do this for FFI C functions, cache the namespace > instead, e.g. local lib = ffi.load("lib"). > > * Avoid inventing your own dispatch mechanisms. > - Prefer to use built-in mechanisms, e.g. metamethods. > > * Do not try to second-guess the JIT compiler. > - It's perfectly ok to write 'z = x[a+b] + y[a+b]'. > - Do not try CSE by hand, e.g. 'local c = a+b'. > - It's perfectly ok to write 'a[i][j] = a[i][j] * a[i][j+1]'. > - Do not try to cache partial FFI struct/array references > (e.g. a[i]) unless they are long-lived (e.g. in a big loop). > > * Be careful with aliasing, esp. when using multiple arrays. > - LuaJIT uses strict type-based disambiguation, but there are > limits to this due to C99 conformance. > - E.g. in 'x[i] = a[i] + c[i]; y[i] = a[i] + d[i]' the load of > a[i] needs to be done twice, because x could alias a. > It *does* make sense to use 'do local t = a[i] ... end' here. > > * Reduce the number of live temporary variables. > - Best to initialize on definition, e.g. 'local y = f(x)' > - Yes, this means you should interleave this with other code. > - Do not hoist variable definitions to the start of a > function -- Lua is not JavaScript nor K&R C. > - Use 'do local y = f(x) ... end' to bound variable lifetimes. > > * Do not intersperse expensive or uncompiled operations. > - print() is not compiled, use io.write(). > - E.g. avoid assert(type(x) == "number", "x is a "..mytype(x)") > - The problem is not the assert() or the condition (basically > free). The problem is the string concatenation, which has to > be executed every time, even if the assertion never fails! > - Watch the output of -jv and -jdump. > > You need to take all of these factors into account before deciding > on a certain algorithm. An advanced algorithm, that's fast in > theory, may be slower than a simpler algorithm, if the simpler > algorithm has much fewer unbiased branches. > > --Mike > This is amazingly useful and I am saving this for later. Is this information posted on a website somewhere in a format that's as easy to read and understand as this? /s/ Adam