Re: LuaJIT benchmark

From: Fredrik Widlund <fredrik.widlund@xxxxxxxxx>
To: luajit@xxxxxxxxxxxxx
Date: Tue, 18 Mar 2014 18:29:56 +0100
If this is more than a casual off-hand remark kind of discussion I'll be
happy to take part in it.

The point of this benchmark is to measure the cost of a random lookup in a
hash table of different sizes, with integers, and a hit probability of 50%.

Will it measurement include how different environments handle
a) Memory access patterns - Yes
b) Instruction pipelining patterns - Yes

Is this realistic? Yes.

As far as I understand the reasoning here is around one of two things.

1) The use case. You shouldn't benchmark integers. You shouldn't have a hit
probability of 50%. These are very subjective somewhat arbitrary arguments.
The use cases of a hash table when doing numerical calculations, our
handling large volumes of traffic in a router, are very different form
mapping headers in a HTTP request. Unfortunately I don't have time to cover
a very large range of use cases and this is enough for me to begin with,
but looking at inserts/deletes, memory usage, different hit ratios and key
types is of course relevant as well given some time. It is very common to
have random integers being a test case when benchmarking hash tables
though, one reason being that you could have an equally long, or longer,
discussion around what an appropriate set of 100M string keys would be, and
which hash functions that in turn are suitable for the set of keys.

2) This is the wrong way to measure (1). Given the use case in (1) how
would you accurately measure the cost of a random lookup operation? Doing a
large number of these operations and measure the time is a very common
approach. If you have an approach to suggest that works in fairly similar
ways for the different implementations I would be very happy to switch to
using it.

From the benchmark so far it seems Google DenseHash (C++) and khash (C) are
in the order of 2x-3x times faster compared to doing lookups in Lua using
LuaJIT. Is this fair? Yes, probably. A serious attempt to show otherwise
would be interesting.

It is interesting to look at inserts/deletes, memory usage, 5% or 95% hit
ratios, strings, worst cases, and so forth? Yes. Feel free to pitch in, or
give constructive suggestions.


On Tue, Mar 18, 2014 at 3:38 PM, Dan Eloff <dan.eloff@xxxxxxxxx> wrote:

> I agree with Mike, the dominant costs here is an unpredictable branch (~17
> or more cycles of a branch mispredition, 50% of the time) and the cache
> miss (~100-400 cycles). Typical integer hashing (some bitwise ops and maybe
> a multiply with a bitwise and for a power of 2 sized table) will be less
> than the branch misprediction cost, and rounding error against the cache
> miss.
>
> In fact I suspect that if you can enable huge pages for the Java heap and
> make sure you have enough huge pages allocated on your test machine, then
> Java should take the benchmark, because on the large tables you have not
> just a cache miss but also a TLB miss, which means navigating the 4 level
> page table which should be roughly 3 cache misses on average (1-4). But the
> benchmark may not show that because all it's doing is hash lookups and it's
> possible nearly the entire page table is in cache. In a real program that
> won't be the case.
>
> I think if you really want to show hash table implementation differences
> you need to eliminate the other effects. Ditch the branch misprediction,
> and use a lookup set small enough that the elements you want to lookup will
> all fit within the L2 cache and the TLB (e.g. lookup up to 512 random
> elements. Look them up once before the timed loop, and then time it.)
>
>
> On Mon, Mar 17, 2014 at 5:30 AM, Fredrik Widlund <
> fredrik.widlund@xxxxxxxxx> wrote:
>
>> Sigh, the Lua program doesn't even use local variables in the
>>> inner loop ...
>>
>>
>> Updated the benchmark with declaring x local inside the inner loop
>> instead of just before, which made for some 10% improvements in
>> small/medium sized hashes.
>>
>> Oh, and before anyone thinks this really measures hash table
>>> implementation quality: it's just measuring the overhead of branch
>>> mispredictions and cache misses for an atypical use case of hash
>>> tables.
>>
>>
>> Currently it's measuring different hash algorithms integer lookup
>> performance with a hit rate of 50%. It is a naive benchmark for now because
>> this is currently sufficient to be able to make a point later on.
>>
>> If have some extra time I'll test some variables later on I will test
>> different hit rates and key types. If you have a reference to a source
>> where statistics around hash table use cases is investigated I'll use
>> that as a basis.
>>
>>
>>
>> On Sun, Mar 16, 2014 at 11:57 PM, Mike Pall <mike-1403@xxxxxxxxxx> wrote:
>>
>>> Fredrik Widlund wrote:
>>> > Benchmark of LuaJIT tables vs Java HashMap vs C++ unordered_map
>>>
>>> Sigh, the Lua program doesn't even use local variables in the
>>> inner loop ...
>>>
>>> Oh, and before anyone thinks this really measures hash table
>>> implementation quality: it's just measuring the overhead of branch
>>> mispredictions and cache misses for an atypical use case of hash
>>> tables.
>>>
>>> [Typical hash table accesses use a) string keys with b) a high
>>> temporal correlation and c) they usually have a high hit-rate or a
>>> high miss-rate, but it's rarely 50/50.]
>>>
>>> --Mike
>>>
>>>
>>
>
Follow-Ups:
- Re: LuaJIT benchmark
  - From: Dan Eloff
References:
- LuaJIT benchmark
  - From: Fredrik Widlund
- Re: LuaJIT benchmark
  - From: Mike Pall
- Re: LuaJIT benchmark
  - From: Fredrik Widlund
- Re: LuaJIT benchmark
  - From: Dan Eloff
Re: LuaJIT benchmark

Other related posts: