I've done similar things, the problem is like in naive binary search, you have unpredictable branches which is very expensive on superscalar cpus. Binary search can be written so as to compile to cmovs, trading pipeline flushes for pipeline data dependencies. That makes it faster than linear search even on very small arrays ( http://www.pvk.ca/Blog/2012/07/03/binary-search-star-eliminates-star-branch-mispredictions/ ) I suspect for a dictionary iteration, similar tricks could be employed. Another strategy might be using many SSE loads (maybe just loading one byte per slot) compare them to a vector of empty slots, use a _mm_movemask variant to convert it to a bitvector and then a lookup table or bit scan to convert it to an index of the next non-empty slot. Then you'd be able to iterate without branching. It would have to be tested as to whether it could speed things up. On Tue, Nov 12, 2013 at 9:24 PM, Finn Wilcox <finnw@xxxxxxxxxxx> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 12/11/2013 22:39, Mike Pall wrote: > > but better check if the workaround isn't slower than the > > interpreter ... > > > It's not just about the speed of the next() function. Splitting the > trace has other side effects (forcing FFI scalar types to be boxed, > preventing allocation sinking, redundant int32_t->lua_Number->int32_t > conversions...) > And yes you can create a temporary array but that has its own costs - > Suspend the GC (commit charge) or keep it active (more cache misses) > > And as for the speed of the next() function - I assume it is mostly > due to the branches necessary for probing hash slots? > Maybe one could do some bit twiddling to probe multiple slots in parallel? > -----BEGIN PGP SIGNATURE----- > Version: GnuPG/MacGPG2 v2.0.18 (Darwin) > Comment: GPGTools - http://gpgtools.org > Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ > > iQEcBAEBAgAGBQJSguLmAAoJEHp3pOlF38H/t6sIAI/p66MGc3p80rOuBgLdPTPU > vaXhkgwduWRBSFbHLJLvelsO3L9WK6s6NJfeHC+mZalghpq5xTgQ+W6OXqnRRqh0 > 8nmLE/f1cTzDiJLZjK6ejicPUsZqZcsjsFGKZlxAAMISgTrm1EJXZJ7vBLhMH4UT > 4QnDL9k6JnOw6RlGapIzMfSZr0Sz59Yp27pi7kC/JQsL40jXNKMYSGzGT2muK8Sr > DRRrt9c8z0oRrMl0N5i+R2K6IIFlaLPGv6xt2/jHSUEn7IOV1xZtqkG2dYHXhUJR > 7KbLkeFwHHD6KV5lwwLOdViHdSSnpTg0fMIjnyTtS2xQrypM2G3dCcQFjc563o8= > =GGVf > -----END PGP SIGNATURE----- > >