Re: Arrays oddities

  • From: demetri <demetri.spanos@xxxxxxxxx>
  • To: luajit@xxxxxxxxxxxxx
  • Date: Mon, 31 Mar 2014 03:00:15 -0700

>
> So basically anything my bytes[5] may be lost if the system need that
> memory space, right ?
> Obviously it will not be the case with bytes[2] because it is in the
> bounds.
>

As with C, any access out of allocated bounds is undefined and may cause
a segfault. The fact that it's not blowing up right now is "luck".


> Actually, i don't really care the access time, I just want the fastest as
> a whole. And even if faster access, the whole code is slower. That is odd
> to me.
> If faster in one cycle, 1000 cycles should be faster too.
>

Here's a first step toward understanding better. Try replacing your inner
loop with
this:

for i=0,7281624-1 do
  bytes[i] = i -- or junction_sentinel[i] = i
end

I find that with this the array version is ~4.2x faster than the table
version (and
overall the whole thing goes about 200x faster than your version. The
important
point to take from this is that almost all the work in your example is
iterating over
the lines and converting them to Lua numbers (and then converting them into
ffi
types since the representable range for an int is not the same as the
representable
range for a Lua number ... though I doubt that's what's causing the 8%
discrepancy).

So you're not really testing arrays vs tables. You're testing a much more
complex
interaction between arrays, iterators, string->number conversion, and Lua
Number ->
int conversions. The optimizer is tuned for optimizing performance in code
that is
written with performance in mind (which very rarely would involve these
kinds of
conversions). One could imagine trying to optimize for cases like this, but
Mike's
time is finite and all projects also have a finite "conceptual design
budget"; every
bit of effort into optimizing away this kind of 8% discrepancy is time,
code, and
complexity not spent optimizing idioms that are actually already tuned for
high
performance.

The important points I recommend to take from this are:

1) It's already somewhat fast given the requirement to scan over numbers in
a file and converting to numbers (it's ~2x faster than Python 2.7 on my
machine)

2) Good news! If you can rework your problem to more closely align with the
use-cases the optimizer is tuned for you can pick up *much* more
performance.

3) Think twice, then twice again, before diagnosing the root cause of
performance
in LuaJIT (or any complex JIT). It's rarely as simple as "arrays cost X to
access
while tables cost Y". Even leaving the JIT aside questions of cache
locality and
branch prediction are as likely to matter in real-world applications as
array vs table.

Demetri

Other related posts: