[dokuwiki] Re: search improvements

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1



Chris Smith wrote:
> Hi,
> 
> [snip]
> 
> The analysis has thrown up another factor in search performance, the
> location of the search term(s) within the word index.
> For my test wiki two similar search terms producing similar results, but
> selected from opposite ends of an 11,000+ word index resulted in a
> doubling of the search time.  Guy found similar results for single
> search terms at opposite ends of his ~10,000 word index.  It would seem
> the bigger the wiki, the more words likely to get in the index, the
> slower, on average, searching is likely to be.  Ideas on improving this
> are welcome :-)
> 

You could try splitting the word data but this will get quite
complicated. For example you could some hashing by splitting the word
index in a series of datafiles depending, say on the first character (or
more if we are talking about huge wikis) of the word. Then, given a
search word you only have to search the relevant word index file.


For example words with latin characters in my wiki are as follows:

Starting Letter
________Number of words per starting letter
________________Total words divided by number of words per starting
________________letter
a       314      14.1
b       176      25.2
c       449       9.9
d       289      15.3
e       215      20.6
f       183      24.2
g       107      41.4
h       110      40.3
i       248      17.9
j       22      201.5
k       34      130.4
l       157      28.2
m       235      18.9
n       107      41.4
o       135      32.8
p       334      13.3
q       20      221.6
r       283      15.7
s       463       9.6
t       252      17.6
u       84       52.8
v       83       53.4
w       102      43.5
x       10      443.2
y       13      340.9
z       7       633.1

The last column is indicative of what the improvement could be. Not much
for 'c' or 's' starting words but huge for 'z' or 'x' starting ones.

If you also have numbers and non-latin alphabets, the improvement goes
even further.

On the other hand this should be a PITA to code and I understand that.

- --
Best regards,
Thanos Massias
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFE9svGSy9m2i8jedwRApMAAJ9WkvZZof9i6yW+YFbsTKo8S1QLvQCfbTWt
qwTNalE9TZerFvC84wzwjn0=
=6xr1
-----END PGP SIGNATURE-----
-- 
DokuWiki mailing list - more info at
http://wiki.splitbrain.org/wiki:mailinglist

Other related posts: