[dokuwiki] Re: search improvements

  • From: Chris Smith <chris@xxxxxxxxxxxxx>
  • To: dokuwiki@xxxxxxxxxxxxx
  • Date: Thu, 31 Aug 2006 01:58:02 +0100

Hi,

I have been having an off list conversation with Guy, who has carried out a whole lot more profiling and analysis.

Right now what we have is the three new algorithms, opt1, opt2 and utf8 with similar performance and a definite improvement over the current algorithm (orig). utf8 is a little slower than both opt1 and opt2, but the difference is pretty well insignificant besides the total script running time. opt1 and opt2 work better than each other in slightly different circumstances. opt1 works fractionally better ( < 5% ) when the number of search results in a page is 2-3, they are about the same when the number of results is 1, opt2 is better when the results in a page are 4 or more, the more results in one page the greater opt2's performance advantage. Again the differences are now minor compared to total script running time.

utf8 is based off opt2 (restricted number of preg_match calls) so its performance is likely to be steady irrespective of the number of results within a single page, but unlike opt1 and opt2 it generates its snippet based on characters not bytes.

My preference would be to run with the utf8 algorithm. The most recent patch for which fixes a couple of minor problems and adds a work around for utf8_substr limitations (bug #891) and sets this algorithm to be the default one. The other three are still present.

The analysis has thrown up another factor in search performance, the location of the search term(s) within the word index.

For my test wiki two similar search terms producing similar results, but selected from opposite ends of an 11,000+ word index resulted in a doubling of the search time. Guy found similar results for single search terms at opposite ends of his ~10,000 word index. It would seem the bigger the wiki, the more words likely to get in the index, the slower, on average, searching is likely to be. Ideas on improving this are welcome :-)

Cheers,

Chris


-- DokuWiki mailing list - more info at http://wiki.splitbrain.org/wiki:mailinglist

Other related posts: