[dokuwiki] Re: search improvements

  • From: Chris Smith <chris@xxxxxxxxxxxxx>
  • To: dokuwiki@xxxxxxxxxxxxx
  • Date: Sun, 27 Aug 2006 00:04:52 +0100

Chris Smith wrote:

Maybe ... though I have forgotten my original reasoning - I guess it may have been flawed, here goes...

The context selection amounts - are only 50 bytes if I use substr(). If use utf8_substr() they would be utf-8 characters. But then when I come to plug the offset back into preg_match, I don't know the byte amount. That would mean using two utf8_substr(), one for the "pre" snippet and one for the "post" snippet, so that I could then run strlen on the match + post snippet to ascertain the new amount for offset.

It worse than that. Because I only have a byte offset, I need to convert that into a character offset somehow - in order to work out the position in the string which the match occurs. That probably means using preg_split rather than preg_match and utf8_strlen on the first portion of the split. All very very messy.
--
DokuWiki mailing list - more info at
http://wiki.splitbrain.org/wiki:mailinglist

Other related posts: