If this is more than a casual off-hand remark kind of discussion I'll be happy to take part in it. The point of this benchmark is to measure the cost of a random lookup in a hash table of different sizes, with integers, and a hit probability of 50%. Will it measurement include how different environments handle a) Memory access patterns - Yes b) Instruction pipelining patterns - Yes Is this realistic? Yes. As far as I understand the reasoning here is around one of two things. 1) The use case. You shouldn't benchmark integers. You shouldn't have a hit probability of 50%. These are very subjective somewhat arbitrary arguments. The use cases of a hash table when doing numerical calculations, our handling large volumes of traffic in a router, are very different form mapping headers in a HTTP request. Unfortunately I don't have time to cover a very large range of use cases and this is enough for me to begin with, but looking at inserts/deletes, memory usage, different hit ratios and key types is of course relevant as well given some time. It is very common to have random integers being a test case when benchmarking hash tables though, one reason being that you could have an equally long, or longer, discussion around what an appropriate set of 100M string keys would be, and which hash functions that in turn are suitable for the set of keys. 2) This is the wrong way to measure (1). Given the use case in (1) how would you accurately measure the cost of a random lookup operation? Doing a large number of these operations and measure the time is a very common approach. If you have an approach to suggest that works in fairly similar ways for the different implementations I would be very happy to switch to using it. From the benchmark so far it seems Google DenseHash (C++) and khash (C) are in the order of 2x-3x times faster compared to doing lookups in Lua using LuaJIT. Is this fair? Yes, probably. A serious attempt to show otherwise would be interesting. It is interesting to look at inserts/deletes, memory usage, 5% or 95% hit ratios, strings, worst cases, and so forth? Yes. Feel free to pitch in, or give constructive suggestions. On Tue, Mar 18, 2014 at 3:38 PM, Dan Eloff <dan.eloff@xxxxxxxxx> wrote: > I agree with Mike, the dominant costs here is an unpredictable branch (~17 > or more cycles of a branch mispredition, 50% of the time) and the cache > miss (~100-400 cycles). Typical integer hashing (some bitwise ops and maybe > a multiply with a bitwise and for a power of 2 sized table) will be less > than the branch misprediction cost, and rounding error against the cache > miss. > > In fact I suspect that if you can enable huge pages for the Java heap and > make sure you have enough huge pages allocated on your test machine, then > Java should take the benchmark, because on the large tables you have not > just a cache miss but also a TLB miss, which means navigating the 4 level > page table which should be roughly 3 cache misses on average (1-4). But the > benchmark may not show that because all it's doing is hash lookups and it's > possible nearly the entire page table is in cache. In a real program that > won't be the case. > > I think if you really want to show hash table implementation differences > you need to eliminate the other effects. Ditch the branch misprediction, > and use a lookup set small enough that the elements you want to lookup will > all fit within the L2 cache and the TLB (e.g. lookup up to 512 random > elements. Look them up once before the timed loop, and then time it.) > > > On Mon, Mar 17, 2014 at 5:30 AM, Fredrik Widlund < > fredrik.widlund@xxxxxxxxx> wrote: > >> Sigh, the Lua program doesn't even use local variables in the >>> inner loop ... >> >> >> Updated the benchmark with declaring x local inside the inner loop >> instead of just before, which made for some 10% improvements in >> small/medium sized hashes. >> >> Oh, and before anyone thinks this really measures hash table >>> implementation quality: it's just measuring the overhead of branch >>> mispredictions and cache misses for an atypical use case of hash >>> tables. >> >> >> Currently it's measuring different hash algorithms integer lookup >> performance with a hit rate of 50%. It is a naive benchmark for now because >> this is currently sufficient to be able to make a point later on. >> >> If have some extra time I'll test some variables later on I will test >> different hit rates and key types. If you have a reference to a source >> where statistics around hash table use cases is investigated I'll use >> that as a basis. >> >> >> >> On Sun, Mar 16, 2014 at 11:57 PM, Mike Pall <mike-1403@xxxxxxxxxx> wrote: >> >>> Fredrik Widlund wrote: >>> > Benchmark of LuaJIT tables vs Java HashMap vs C++ unordered_map >>> >>> Sigh, the Lua program doesn't even use local variables in the >>> inner loop ... >>> >>> Oh, and before anyone thinks this really measures hash table >>> implementation quality: it's just measuring the overhead of branch >>> mispredictions and cache misses for an atypical use case of hash >>> tables. >>> >>> [Typical hash table accesses use a) string keys with b) a high >>> temporal correlation and c) they usually have a high hit-rate or a >>> high miss-rate, but it's rarely 50/50.] >>> >>> --Mike >>> >>> >> >