[dokuwiki] Re: search improvements
- From: Chris Smith <chris@xxxxxxxxxxxxx>
- To: dokuwiki@xxxxxxxxxxxxx
- Date: Thu, 31 Aug 2006 01:58:02 +0100
Hi,
I have been having an off list conversation with Guy, who has carried
out a whole lot more profiling and analysis.
Right now what we have is the three new algorithms, opt1, opt2 and utf8
with similar performance and a definite improvement over the current
algorithm (orig). utf8 is a little slower than both opt1 and opt2, but
the difference is pretty well insignificant besides the total script
running time. opt1 and opt2 work better than each other in slightly
different circumstances. opt1 works fractionally better ( < 5% ) when
the number of search results in a page is 2-3, they are about the same
when the number of results is 1, opt2 is better when the results in a
page are 4 or more, the more results in one page the greater opt2's
performance advantage. Again the differences are now minor compared to
total script running time.
utf8 is based off opt2 (restricted number of preg_match calls) so its
performance is likely to be steady irrespective of the number of results
within a single page, but unlike opt1 and opt2 it generates its snippet
based on characters not bytes.
My preference would be to run with the utf8 algorithm. The most recent
patch for which fixes a couple of minor problems and adds a work around
for utf8_substr limitations (bug #891) and sets this algorithm to be the
default one. The other three are still present.
The analysis has thrown up another factor in search performance, the
location of the search term(s) within the word index.
For my test wiki two similar search terms producing similar results, but
selected from opposite ends of an 11,000+ word index resulted in a
doubling of the search time. Guy found similar results for single
search terms at opposite ends of his ~10,000 word index. It would seem
the bigger the wiki, the more words likely to get in the index, the
slower, on average, searching is likely to be. Ideas on improving this
are welcome :-)
Cheers,
Chris
--
DokuWiki mailing list - more info at
http://wiki.splitbrain.org/wiki:mailinglist
Other related posts: