[dokuwiki] Re: search improvements

From: Chris Smith <chris@xxxxxxxxxxxxx>
To: dokuwiki@xxxxxxxxxxxxx
Date: Sat, 26 Aug 2006 10:15:05 +0100

Andreas Gohr wrote:

On Sat, 26 Aug 2006 09:21:12 +0100
Chris Smith <chris@xxxxxxxxxxxxx> wrote:
Andreas Gohr wrote:
I noticed the use of some strlen calls there. Are they used in a UTF-8 safe way there or would it be possible that they split a multibyte char? If that could happen we should add a check to strip invalid UTF-8 chars from beginning and end of the snippet - this would be a nice addition to the utf-8 lib.
Yes, I think that is the best solution, adjusting the strings to ensure they always start/end at utf-8 character boundaries. I'll see what I can come up with.
I just pushed a patch adding a function from Harry's utf8 library to
strip bad bytes.
Andi

That may not entirely fix the problem. I am not certain if preg_match will break down if not asked to start at a proper utf8 character boundary using offset. I am working on a fix to adjust the snippet start and end indexes to the nearest utf8 boundary before using substr and preg_match. That should mean that although I am dealing with byte indexes and byte lengths, those numbers will always correspond to utf8 character boundaries.

I'll send it through once I have finished checking it.

Cheers,

Chris
--
DokuWiki mailing list - more info at
http://wiki.splitbrain.org/wiki:mailinglist

Follow-Ups:
- [dokuwiki] Re: search improvements
  - From: Chris Smith
- [dokuwiki] Re: search improvements
  - From: Andreas Gohr

References:
- [dokuwiki] search improvements
  - From: Chris Smith
- [dokuwiki] Re: search improvements
  - From: Andreas Gohr
- [dokuwiki] Re: search improvements
  - From: Chris Smith
- [dokuwiki] Re: search improvements
  - From: Andreas Gohr

[dokuwiki] Re: search improvements

Other related posts: