Re: LuaJIT benchmark

From: Dan Eloff <dan.eloff@xxxxxxxxx>
To: luajit@xxxxxxxxxxxxx
Date: Tue, 18 Mar 2014 09:38:05 -0500

I agree with Mike, the dominant costs here is an unpredictable branch (~17
or more cycles of a branch mispredition, 50% of the time) and the cache
miss (~100-400 cycles). Typical integer hashing (some bitwise ops and maybe
a multiply with a bitwise and for a power of 2 sized table) will be less
than the branch misprediction cost, and rounding error against the cache
miss.

In fact I suspect that if you can enable huge pages for the Java heap and
make sure you have enough huge pages allocated on your test machine, then
Java should take the benchmark, because on the large tables you have not
just a cache miss but also a TLB miss, which means navigating the 4 level
page table which should be roughly 3 cache misses on average (1-4). But the
benchmark may not show that because all it's doing is hash lookups and it's
possible nearly the entire page table is in cache. In a real program that
won't be the case.

I think if you really want to show hash table implementation differences
you need to eliminate the other effects. Ditch the branch misprediction,
and use a lookup set small enough that the elements you want to lookup will
all fit within the L2 cache and the TLB (e.g. lookup up to 512 random
elements. Look them up once before the timed loop, and then time it.)

On Mon, Mar 17, 2014 at 5:30 AM, Fredrik Widlund
<fredrik.widlund@xxxxxxxxx>wrote:

> Sigh, the Lua program doesn't even use local variables in the
>> inner loop ...
>
>
> Updated the benchmark with declaring x local inside the inner loop instead
> of just before, which made for some 10% improvements in small/medium sized
> hashes.
>
> Oh, and before anyone thinks this really measures hash table
>> implementation quality: it's just measuring the overhead of branch
>> mispredictions and cache misses for an atypical use case of hash
>> tables.
>
>
> Currently it's measuring different hash algorithms integer lookup
> performance with a hit rate of 50%. It is a naive benchmark for now because
> this is currently sufficient to be able to make a point later on.
>
> If have some extra time I'll test some variables later on I will test
> different hit rates and key types. If you have a reference to a source
> where statistics around hash table use cases is investigated I'll use that
> as a basis.
>
>
>
> On Sun, Mar 16, 2014 at 11:57 PM, Mike Pall <mike-1403@xxxxxxxxxx> wrote:
>
>> Fredrik Widlund wrote:
>> > Benchmark of LuaJIT tables vs Java HashMap vs C++ unordered_map
>>
>> Sigh, the Lua program doesn't even use local variables in the
>> inner loop ...
>>
>> Oh, and before anyone thinks this really measures hash table
>> implementation quality: it's just measuring the overhead of branch
>> mispredictions and cache misses for an atypical use case of hash
>> tables.
>>
>> [Typical hash table accesses use a) string keys with b) a high
>> temporal correlation and c) they usually have a high hit-rate or a
>> high miss-rate, but it's rarely 50/50.]
>>
>> --Mike
>>
>>
>

Follow-Ups:
- Re: LuaJIT benchmark
  - From: Fredrik Widlund

References:
- LuaJIT benchmark
  - From: Fredrik Widlund
- Re: LuaJIT benchmark
  - From: Mike Pall
- Re: LuaJIT benchmark
  - From: Fredrik Widlund

Re: LuaJIT benchmark

Other related posts: