Re: table.foreach NYI

  • From: Dan Eloff <dan.eloff@xxxxxxxxx>
  • To: luajit@xxxxxxxxxxxxx
  • Date: Thu, 14 Nov 2013 17:24:21 -0500

I've done similar things, the problem is like in naive binary search, you
have unpredictable branches which is very expensive on superscalar cpus.
Binary search can be written so as to compile to cmovs, trading pipeline
flushes for pipeline data dependencies. That makes it faster than linear
search even on very small arrays (
http://www.pvk.ca/Blog/2012/07/03/binary-search-star-eliminates-star-branch-mispredictions/
)

I suspect for a dictionary iteration, similar tricks could be employed.
Another strategy might be using many SSE loads (maybe just loading one byte
per slot) compare them to a vector of empty slots, use a _mm_movemask
variant to convert it to a bitvector and then a lookup table or bit scan to
convert it to an index of the next non-empty slot. Then you'd be able to
iterate without branching. It would have to be tested as to whether it
could speed things up.


On Tue, Nov 12, 2013 at 9:24 PM, Finn Wilcox <finnw@xxxxxxxxxxx> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 12/11/2013 22:39, Mike Pall wrote:
> > but better check if the workaround isn't slower than the
> > interpreter ...
> >
> It's not just about the speed of the next() function.  Splitting the
> trace has other side effects (forcing FFI scalar types to be boxed,
> preventing allocation sinking, redundant int32_t->lua_Number->int32_t
> conversions...)
> And yes you can create a temporary array but that has its own costs -
> Suspend the GC (commit charge) or keep it active (more cache misses)
>
> And as for the speed of the next() function - I assume it is mostly
> due to the branches necessary for probing hash slots?
> Maybe one could do some bit twiddling to probe multiple slots in parallel?
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG/MacGPG2 v2.0.18 (Darwin)
> Comment: GPGTools - http://gpgtools.org
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>
> iQEcBAEBAgAGBQJSguLmAAoJEHp3pOlF38H/t6sIAI/p66MGc3p80rOuBgLdPTPU
> vaXhkgwduWRBSFbHLJLvelsO3L9WK6s6NJfeHC+mZalghpq5xTgQ+W6OXqnRRqh0
> 8nmLE/f1cTzDiJLZjK6ejicPUsZqZcsjsFGKZlxAAMISgTrm1EJXZJ7vBLhMH4UT
> 4QnDL9k6JnOw6RlGapIzMfSZr0Sz59Yp27pi7kC/JQsL40jXNKMYSGzGT2muK8Sr
> DRRrt9c8z0oRrMl0N5i+R2K6IIFlaLPGv6xt2/jHSUEn7IOV1xZtqkG2dYHXhUJR
> 7KbLkeFwHHD6KV5lwwLOdViHdSSnpTg0fMIjnyTtS2xQrypM2G3dCcQFjc563o8=
> =GGVf
> -----END PGP SIGNATURE-----
>
>

Other related posts: