[dokuwiki] Re: [PATCH] search problems and solutions

  • From: Matthias Grimm <matthiasgrimm@xxxxxxxxxxxxxxxxxxxxx>
  • To: dokuwiki@xxxxxxxxxxxxx
  • Date: Wed, 1 Jun 2005 23:04:54 +0200

On Wed, 1 Jun 2005 20:22:56 +0200
Andreas Gohr <andi@xxxxxxxxxxxxxx> wrote:

> 
> Matthias Grimm <matthiasgrimm@xxxxxxxxxxxxxxxxxxxxx> wrote:
> 
> > 1. The search input accepts more than one word but more words result
> > in more results. I would expect that I narrow the search with more
> > words to find. I analyzed the code and found that DokuWiki combined
> > the query words with 'OR' instead of 'AND' as I expected and as most
> > search engines(for eg. google) work.
> > 
> > The patch 'search_combination_and.patch solved the problem and closes
> > FS#158, I think, as additonal benefit.
> 
> Hmm I like it and will probably add it, but let's think about it first ;-) 
> The former method combined all word into a single regular expression. Your 
> code uses a regexp for each word. So if I use three words I roughly triple 
> the time used on matching. Does anyone know a way on combining the search 
> words into one regexp? Maybe using assertions?

If there is a faster solution it would be fine, but I think it doesn't
waste much time. In the past the regexp code has to check multiple words
against the text. Now it has to check only one word, but n times. The
count of word comparisons are the same. I think the only overhead is the
small PHP loop. But if someone came up with a shorter and faster
solution, that would be fine. I'm not a reqexp expert.

A wonderful extension would be if words could be excluded from the
results. The syntax "-text" could mean only pages without the word
"text". But this seems a big operation to me so I postponed it for now.
If we have e regexp expert between us: How could we handle this?

  Best Regards
    Matthias
-- 
DokuWiki mailing list - more info at
http://wiki.splitbrain.org/wiki:mailinglist

Other related posts: