On Tue, 11 Mar 2003, Adi Oanca wrote: > > This might sound strange coming from me, but man, you guys are going into > > stuff that I don't understand very well. What's a hash table? > > A hashtable, maps keys to values. > it is a list of key-value associations! > for example: > > struct item{ > void* key; > void* value; > } > class HashTable; > > struct link{ > item i; > link *l; > } > > class HashTable{ > void Add(key, value); > void* Get(key); > ... > private: > link *list; > } > > they are commonly used for storing and more importantly searching by a known > key! As I've learned, a hash table is not a list, but an array. The size of the array is called the capacity of the hash table. There are basically two strategies: Open hashing, where the slots of the array are in fact lists, so that each slot can contain an arbitrary number of the elements. And closed hashing, where a slot can hold only one element. Thus in this case the capacity is actually the maximal number of elements the hash table can contain. In the general an element does not need to have an inner structure structure known to the hash table. In C you would work with void*. The only thing that is needed is a hash function, that computes a so called hash value, an integer, for a given element. This hash value is used, modulo capacity, to index the array of elements. So, as a first approximation, to insert an element into the hash table you compute its hash value modulo capacity and put it at that index. To look an element up, you compute the index similarly. Since the hash value might not be unique -- and even if it is, it may not be modulo capacity -- more than one element may have the same index. For open hashing this is no problem, since you have a list at the respective slot. When inserting an element, you add it to the list. For looking it up, you need to traverse that list. For closed hashing a new slot need to be found for a new element, when the one at the computed index is already occupied. There are different approaches to do that. One can simply increment the index until a free slot could be found. Or you have a second hash function that you use in such a case. Or... I'm not sure, whether there are closed hashing strategies, that can deal with removal of elements, and, if so, how efficiently. So in principle a hash table implements a set. You can add, remove and look up elements in O(1) time, as long as the hash function is good, and (for open hashing) the number of elements doesn't exceed the capacity (that much). There are tons of heuristics for various aspects. E.g. the capacity should be prime, or the usage (number of elements / capacity) should not exceed a certain percentage (70-80% ?),... One usually does a so called rehashing when the usage does exceed its threshold. Then the capacity is for istance doubled (adjusted to the next prime) and all elements are re-inserted. Of course a set can be used also as a map: The elements are then (key, value) pairs and the hash value is computed of the key. So you can find an element by its key. CU, Ingo