-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 23/05/2013 00:15, Mike Pall wrote: > Hash table iterations need to scavenge a sparse structure with two > variants. The time is dominated by branch prediction misses. > Inlining the code doesn't magically make these go away. So, yes, > pairs() will always be slow. > What would you recommend as a fast way to represent iterable sets? i.e a data structure that implements these operations: 1. Add an element (typically a string) to a set (skipping if it is already present) 2. Loop over the elements of the set in arbitrary order where #2 happens much more frequently than #1? What I usually do (in LuaJIT, and in similar Java code) is maintain both a hash and an array, adding elements to the array as necessary after using the hash to detect & skip duplicates. Intuitively this feels wasteful (of memory) as it duplicates the same information in the hash and the array. Still, it outperforms the obvious hash-only or array-only versions. I understand that traversing the hash means branching depending whether the pointer in the key slot is null or not (once to skip empty buckets and again when traversing the short linked list associated with each non-empty bucket.) But is there another way to speed up the iteration loop without duplicating the key pointers? -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.18 (Darwin) Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJRnWL1AAoJEHp3pOlF38H/g+8H/RQ+MiEyYqm1JpIgpLzjUsXn isnIc15oMQ4AX/T0X+Dlyy6UMJxB27ErYdWHhvwog+MR0GTyzYU0IZ7czcyBwy/h 6kvdx5xgA0KIwm5SSY2gTC9UI9F1/d7Uk448X3nPK/tbasF+39BkAzEHMHUf/SY3 f3CFDsRyKOhxWe/Llfp0JRDL6jeYuQISFxUzpVocGkIU7Nn14d934Og2U4ZWdhg6 zUAAef4ksik4laHYcokkC26iRys5vwLljFTt83axyiOr0+yJqvoBC8towOeaLcfu Goj13hvJ8llbRXJHi0uT8xrxUs0kwcCoVHcVj8B+GHgk8fTTe2ZBKT+cxlBoC1w= =tqUT -----END PGP SIGNATURE-----