Re: Tuning numerical computations for LuaJIT (was Re: [ANN] Sci-1.0-beta1)

  • From: Coda Highland <chighland@xxxxxxxxx>
  • To: luajit@xxxxxxxxxxxxx
  • Date: Thu, 20 Sep 2012 07:04:22 -0700

On Thu, Sep 20, 2012 at 5:28 AM, Mike Pall <mike-1209@xxxxxxxxxx> wrote:
> Markus Walther wrote:
>> On Thu, Sep 20, 2012 at 1:35 AM, Geoff Leyland
>> <geoff_leyland@xxxxxxxxxxx> wrote:
>> > https://github.com/vection/LuaFFT would appear to be just that.
>>
>> That port is pure Lua... It's likely that FFI would help quite a bit
>> to make it faster.
>
> Probably. But the code is inefficient in many ways.
>
>> Still, according to FFTW benchmarks, Ooura's FFT is quite a bit faster
>> than kissfft, so might be a better base for a LuaJIT port:
>
> Maybe or maybe not. It looks rather branchy.
>
> The following things are needed to get the best speed out for
> numerical computations with LuaJIT (in order of importance):
>
> * Reduce number of unbiased/unpredictable branches.
>   - Heavily biased branches (>95% in one direction) are fine.
>   - Prefer branch-free algorithms.
>     - Use math.min() and math.max().
>     - Use bit.*, e.g. for conditional index computations.
>
> * Use FFI data structures.
>   - Use int32_t, avoid uint32_t data types.
>   - Use double, avoid float data types.
>   - Metamethods are fine, but don't overdose them.
>
> * Call C functions only via the FFI.
>   - Avoid calling trivial functions, better rewrite them in Lua.
>   - Avoid callbacks -- use pull-style APIs (read/write) and
>     iterators instead.
>
> * Use plain 'for i=start,stop,step do ... end' loops.
>   - Prefer plain array indexing, e.g. 'a[i+2]'.
>   - Avoid pointer arithmetic.
>
> * Find the right balance for unrolling.
>   - Avoid inner loops with low iteration count (< 10).
>   - Only unroll loops if the loop body has not too many instructions.
>   - Consider using templates instead of hand-unrolling (see GSL Shell).
>   - You may have to experiment a bit.
>
> * Define and call only 'local' (!) functions within a module.
>
> * Cache often-used functions from other modules in upvalues.
>   - E.g. local sin = math.sin ... local function foo() return 2*sin(x) end
>   - Don't do this for FFI C functions, cache the namespace
>     instead, e.g. local lib = ffi.load("lib").
>
> * Avoid inventing your own dispatch mechanisms.
>   - Prefer to use built-in mechanisms, e.g. metamethods.
>
> * Do not try to second-guess the JIT compiler.
>   - It's perfectly ok to write 'z = x[a+b] + y[a+b]'.
>     - Do not try CSE by hand, e.g. 'local c = a+b'.
>   - It's perfectly ok to write 'a[i][j] = a[i][j] * a[i][j+1]'.
>     - Do not try to cache partial FFI struct/array references
>       (e.g. a[i]) unless they are long-lived (e.g. in a big loop).
>
> * Be careful with aliasing, esp. when using multiple arrays.
>   - LuaJIT uses strict type-based disambiguation, but there are
>     limits to this due to C99 conformance.
>   - E.g. in 'x[i] = a[i] + c[i]; y[i] = a[i] + d[i]' the load of
>     a[i] needs to be done twice, because x could alias a.
>     It *does* make sense to use 'do local t = a[i] ... end' here.
>
> * Reduce the number of live temporary variables.
>   - Best to initialize on definition, e.g. 'local y = f(x)'
>     - Yes, this means you should interleave this with other code.
>     - Do not hoist variable definitions to the start of a
>       function -- Lua is not JavaScript nor K&R C.
>   - Use 'do local y = f(x) ... end' to bound variable lifetimes.
>
> * Do not intersperse expensive or uncompiled operations.
>   - print() is not compiled, use io.write().
>   - E.g. avoid assert(type(x) == "number", "x is a "..mytype(x)")
>     - The problem is not the assert() or the condition (basically
>       free). The problem is the string concatenation, which has to
>       be executed every time, even if the assertion never fails!
>   - Watch the output of -jv and -jdump.
>
> You need to take all of these factors into account before deciding
> on a certain algorithm. An advanced algorithm, that's fast in
> theory, may be slower than a simpler algorithm, if the simpler
> algorithm has much fewer unbiased branches.
>
> --Mike
>

This is amazingly useful and I am saving this for later.

Is this information posted on a website somewhere in a format that's
as easy to read and understand as this?

/s/ Adam

Other related posts: