[dokuwiki] Re: Blacklist Performance

  • From: Doogie <wiki@xxxxxxxxx>
  • To: <dokuwiki@xxxxxxxxxxxxx>
  • Date: Wed, 15 Sep 2010 08:45:27 +0200

On Wed, 15 Sep 2010 08:37:49 +0200, Doogie <wiki@xxxxxxxxx> wrote:
> So I think this can easily be optimized, for example by transforming all
> the lines into one big "OR'ed" regular expression:
> s/blockeded-domain1.com|anoter-bad-domain.net|and-so-on.de/
> Robert

Ups, I just realized, that this is what the code is already doing :-)  But
still I think this can be optimized, by "anchoring" the regular expression
at the "http" prefix:

old code:

    if(preg_match('#('.join('|',$re).')#si',$text, $match=array())) {
      return true;
    }

my suggestion: strip all http prefixes in wordblock.conf and put them as a
prefix:

    if(preg_match('#https?:\/\/([^\/]*\.)?('.join('|',$re).')#si',$text,
$match=array())) {
      return true;
    }

Shouldn't this be much faster, because matching the prefix in the regular
expression's internal state machine fails much faster, when checking the
contant of a page?

Yours,
  Robert
-- 
DokuWiki mailing list - more info at
http://www.dokuwiki.org/mailinglist

Other related posts: