[interfacekit] Re: BHandler::fToken

From: Ingo Weinhold <bonefish@xxxxxxxxxxxxxxx>
To: interfacekit@xxxxxxxxxxxxx
Date: Tue, 11 Mar 2003 19:11:05 +0100 (MET)

On Tue, 11 Mar 2003, Adi Oanca wrote:

> > This might sound strange coming from me, but man, you guys are going into
> > stuff that I don't understand very well. What's a hash table?
>
> A hashtable, maps keys to values.
> it is a list of  key-value associations!
> for example:
>
> struct item{
>     void*    key;
>     void*    value;
> }
> class HashTable;
>
> struct link{
>     item    i;
>     link    *l;
> }
>
> class HashTable{
>     void    Add(key, value);
>     void*  Get(key);
>     ...
> private:
>     link    *list;
> }
>
> they are commonly used for storing and more importantly searching by a known
> key!

As I've learned, a hash table is not a list, but an array. The size of the
array is called the capacity of the hash table. There are basically two
strategies: Open hashing, where the slots of the array are in fact lists,
so that each slot can contain an arbitrary number of the elements. And
closed hashing, where a slot can hold only one element. Thus in this case
the capacity is actually the maximal number of elements the hash table can
contain.

In the general an element does not need to have an inner structure
structure known to the hash table. In C you would work with void*. The
only thing that is needed is a hash function, that computes a so called
hash value, an integer, for a given element. This hash value is used,
modulo capacity, to index the array of elements.

So, as a first approximation, to insert an element into the hash table you
compute its hash value modulo capacity and put it at that index. To look
an element up, you compute the index similarly. Since the hash value
might not be unique -- and even if it is, it may not be modulo capacity --
more than one element may have the same index.

For open hashing this is no problem, since you have a list at the
respective slot. When inserting an element, you add it to the list. For
looking it up, you need to traverse that list.

For closed hashing a new slot need to be found for a new element, when the
one at the computed index is already occupied. There are different
approaches to do that. One can simply increment the index until a free
slot could be found. Or you have a second hash function that you use in
such a case. Or...
I'm not sure, whether there are closed hashing strategies, that can deal
with removal of elements, and, if so, how efficiently.

So in principle a hash table implements a set. You can add, remove and
look up elements in O(1) time, as long as the hash function is good, and
(for open hashing) the number of elements doesn't exceed the capacity
(that much).

There are tons of heuristics for various aspects. E.g. the capacity should
be prime, or the usage (number of elements / capacity) should not exceed a
certain percentage (70-80% ?),... One usually does a so called rehashing
when the usage does exceed its threshold. Then the capacity is for istance
doubled (adjusted to the next prime) and all elements are re-inserted.

Of course a set can be used also as a map: The elements are then
(key, value) pairs and the hash value is computed of the key. So you can
find an element by its key.

CU, Ingo

References:
- [interfacekit] Re: BHandler::fToken
  - From: Marc Flerackers
- [interfacekit] Re: BHandler::fToken
  - From: Ingo Weinhold
- [interfacekit] Re: BHandler::fToken
  - From: DarkWyrm
- [interfacekit] Re: BHandler::fToken
  - From: Adi Oanca

[interfacekit] Re: BHandler::fToken

Other related posts: