[dokuwiki] Re: search improvements

  • From: Thanos Massias <tm@xxxxxxxxxxx>
  • To: dokuwiki@xxxxxxxxxxxxx
  • Date: Thu, 31 Aug 2006 14:45:10 +0300

Hash: SHA1

Chris Smith wrote:
> Hi,
> [snip]
> The analysis has thrown up another factor in search performance, the
> location of the search term(s) within the word index.
> For my test wiki two similar search terms producing similar results, but
> selected from opposite ends of an 11,000+ word index resulted in a
> doubling of the search time.  Guy found similar results for single
> search terms at opposite ends of his ~10,000 word index.  It would seem
> the bigger the wiki, the more words likely to get in the index, the
> slower, on average, searching is likely to be.  Ideas on improving this
> are welcome :-)

You could try splitting the word data but this will get quite
complicated. For example you could some hashing by splitting the word
index in a series of datafiles depending, say on the first character (or
more if we are talking about huge wikis) of the word. Then, given a
search word you only have to search the relevant word index file.

For example words with latin characters in my wiki are as follows:

Starting Letter
________Number of words per starting letter
________________Total words divided by number of words per starting
a       314      14.1
b       176      25.2
c       449       9.9
d       289      15.3
e       215      20.6
f       183      24.2
g       107      41.4
h       110      40.3
i       248      17.9
j       22      201.5
k       34      130.4
l       157      28.2
m       235      18.9
n       107      41.4
o       135      32.8
p       334      13.3
q       20      221.6
r       283      15.7
s       463       9.6
t       252      17.6
u       84       52.8
v       83       53.4
w       102      43.5
x       10      443.2
y       13      340.9
z       7       633.1

The last column is indicative of what the improvement could be. Not much
for 'c' or 's' starting words but huge for 'z' or 'x' starting ones.

If you also have numbers and non-latin alphabets, the improvement goes
even further.

On the other hand this should be a PITA to code and I understand that.

- --
Best regards,
Thanos Massias
Version: GnuPG v1.2.5 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

DokuWiki mailing list - more info at

Other related posts: