Hi!
I have a code that implements SHA256 algorithm using bit.* functions.
The code behaves strangely when being benchmarked.
For example, to calculate SHA256 of 1 GByte data:
64-bit LuaJIT completes the task within 19 seconds.
32-bit LuaJIT needs a lot more time (about 400 seconds).
Invocation with "-jv" option shows the following error (only on 32-bit
LuaJIT):
[TRACE --- sha256.lua:178 -- NYI: PHI shuffling too complex at
sha256.lua:179]
This is the inner loop:
177: local a, b, c, d, e, f, g, h = H[1], H[2], H[3], H[4], H[5], H[6],
H[7], H[8]
178: for i = 1, 64 do
179: local z = bxor(ror(e, 6), ror(e, 11), rol(e, 7)) + bxor(band(e,
f), band(bnot(e), g)) + h + K[i] + W[i]
180: h, g, f, e = g, f, e, z + d
181: d, c, b, a = c, b, a, z + bxor(band(a, b), band(a, c), band(b, c))
+ bxor(ror(a, 2), ror(a, 13), rol(a, 10))
182: end
183: H[1], H[2], H[3], H[4] = band(a + H[1], -1), band(b + H[2], -1),
band(c + H[3], -1), band(d + H[4], -1)
184: H[5], H[6], H[7], H[8] = band(e + H[5], -1), band(f + H[6], -1),
band(g + H[7], -1), band(h + H[8], -1)
I have noticed that after insertion of statement "z = band(z, -1)" between
lines #179 and #180 the "32-bit working time" is decreased from 400 to 120
seconds.
But this extra "band()" increases the "64-bit working time" from 19 to 21
seconds.
Can someone please explain what limitation is hidden behind "PHI shuffling
too complex" error message?
How should I rewrite line #179 to make this loop pretty fast on both 32-bit
and 64-bit LuaJIT?
-- Egor