[liblouis-liblouisxml] Re: Improved hash function for tables

From: Christian Egli <christian.egli@xxxxxx>
To: "John J. Boyer" <john.boyer@xxxxxxxxxxxxxxxxx>
Date: Thu, 02 Aug 2012 14:15:25 +0200

Hi all

"John J. Boyer" <john.boyer@xxxxxxxxxxxxxxxxx> writes:

> Thanks to both of you for your questions and comments. I wrote the 
> current table functions twenty years abo for brltty, based mostly on 
> what I learned in computer science courses. The table does have a fixed 
> number of hash "buckets", each of which contains a pointer to a hash 
> chain. The pointers are 32-bit integers (actually subscripts for a large 
> area that stores the rules), so there can be a lot of them. Currently 
> the number (HASHNUM) is 1123. It can be made larger. 

For German we have a large number of exceptions to the rules (in the
order of 150k words). The choice of hash function (just the first two
chars) has been a problem in that liblouis is very slow when using this
table. I tried increasing the size of the has table by increasing the
HASHNUM but noticed no effect until I realized that the hash function
only used the first two chars, so there are many collisions and the
lookup is essentially a sequential search through 150k entries.

I would welcome any change to the hash algorithm that is better geared
towards words from human languages. I'm sure there are ready made
algorithms on the net that could be used. According to an article on
stackexchange[1] it appears that MurmurHash[2] is a very good hash
algorithm. There is an implementation which in the public domain[3] (see
line 236-308), i.e. which could be used by us without any legal
problems.

Thanks
Christian

Footnotes: 
[1]  
http://programmers.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed
[2]  http://en.wikipedia.org/wiki/MurmurHash
[3]  http://www.qdecoder.org/svn/qlibc/trunk/src/utilities/qhash.c?revision=94
-- 
Christian Egli
Swiss Library for the Blind, Visually Impaired and Print Disabled
Grubenstrasse 12, CH-8045 Zürich, Switzerland
For a description of the software, to download it and links to
project pages go to http://www.abilitiessoft.com

[liblouis-liblouisxml] Re: Improved hash function for tables

Other related posts: