Hi all "John J. Boyer" <john.boyer@xxxxxxxxxxxxxxxxx> writes: > Thanks to both of you for your questions and comments. I wrote the > current table functions twenty years abo for brltty, based mostly on > what I learned in computer science courses. The table does have a fixed > number of hash "buckets", each of which contains a pointer to a hash > chain. The pointers are 32-bit integers (actually subscripts for a large > area that stores the rules), so there can be a lot of them. Currently > the number (HASHNUM) is 1123. It can be made larger. For German we have a large number of exceptions to the rules (in the order of 150k words). The choice of hash function (just the first two chars) has been a problem in that liblouis is very slow when using this table. I tried increasing the size of the has table by increasing the HASHNUM but noticed no effect until I realized that the hash function only used the first two chars, so there are many collisions and the lookup is essentially a sequential search through 150k entries. I would welcome any change to the hash algorithm that is better geared towards words from human languages. I'm sure there are ready made algorithms on the net that could be used. According to an article on stackexchange[1] it appears that MurmurHash[2] is a very good hash algorithm. There is an implementation which in the public domain[3] (see line 236-308), i.e. which could be used by us without any legal problems. Thanks Christian Footnotes: [1] http://programmers.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed [2] http://en.wikipedia.org/wiki/MurmurHash [3] http://www.qdecoder.org/svn/qlibc/trunk/src/utilities/qhash.c?revision=94 -- Christian Egli Swiss Library for the Blind, Visually Impaired and Print Disabled Grubenstrasse 12, CH-8045 Zürich, Switzerland For a description of the software, to download it and links to project pages go to http://www.abilitiessoft.com