Re: FFI hashtable?

  • From: Luke Gorrie <luke@xxxxxxxx>
  • To: luajit@xxxxxxxxxxxxx
  • Date: Sun, 15 Mar 2015 14:55:35 +0100

On 15 March 2015 at 12:20, Mike Pall <mike-1503@xxxxxxxxxx> wrote:

> Why do you think the majority of accesses are out-of-cache? The
> distribution of network flows is highly skewed in the short term
> (big transfers going on momentarily). Lookups will then hit the
> same memory addresses on every step, which means they are most
> certainly cached.
>

This is true near the edges of the network e.g. on a home/office router. I
am thinking of closer to the middle of the network e.g. on the NAT device
of an ISP. The traffic there looks more like an infinite number of parallel
connections to Facebook to fetch tiny HTTP objects :).

Generally I want to be conservative and so I am more concerned about worst
case performance than the best case. If the cache hit rate is higher than I
expect that is nice and it will give people comfort to see their
applications running with a lot of spare capacity. I don't want to depend
on it though. (I have seen what happens when people do.)

Also, if you can batch your lookups, you can interleave the binary
> lookup table steps of multiple lookups to mitigate the load
> latencies. Quite easy to do, since the number of steps is
> constant. Much harder to get right with a variable number of
> probes for a hash table.
>

Interesting idea. I'd love a link if this happens to be in the literature.

I have used cache-conscious trees (Judy arrays) for this problem in the
past. You are right, they work very well.

The reason I am shopping around now is that I want a simple implementation
(Judy isn't) with excellent worst-case performance, and this latter quality
is leading me from trees to hash tables at the moment.

Being conservative, the worst case performance is important because it can
mean the difference between implementing a feature at all or saying it is
too expensive to be practical.

And finally, since your primary use case is open networking, the
> collision vulnerability _will_ hit you, eventually. Any of these
> not-fully-crypto-safe-but-stil-acceptably-fast hash functions have
> been broken or will be broken over time (see various reports,
> right after the issue was publicized). Do you want to take the
> business risk?
>

I see it as mandatory to have well-defined and practical overload behavior.
Any data structure will fill up eventually so applications have to be
prepared for a table insertion to fail.

Trees are simple. "Max N elements, insertions beyond this are rejected."

Hash tables are indeed trickier. "Max N buckets and M elements per bucket,
insertions that would violate this are rejected" seems likely to be
workable to me though.

P.S. Regarding prefetching, I do take your point, and I have read the Intel
manuals and guidelines and have a notion of how specific the conditions
must be. I would still really like to have an intrinsic for it. Every few
months I find a "long shot but worth a try" situation where I want to try
prefetching and so far I am doing this with FFI calls to a gcc builtin
(ugh).

Cheers,
-Luke

Other related posts: