[dokuwiki] Re: search improvements
- From: Chris Smith <chris@xxxxxxxxxxxxx>
- To: dokuwiki@xxxxxxxxxxxxx
- Date: Sat, 26 Aug 2006 10:15:05 +0100
Andreas Gohr wrote:
On Sat, 26 Aug 2006 09:21:12 +0100
Chris Smith <chris@xxxxxxxxxxxxx> wrote:
Andreas Gohr wrote:
I noticed the use of some strlen calls there. Are they
used in a UTF-8 safe way there or would it be possible that they
split a multibyte char? If that could happen we should add a check
to strip invalid UTF-8 chars from beginning and end of the snippet -
this would be a nice addition to the utf-8 lib.
Yes, I think that is the best solution, adjusting the strings to
ensure they always start/end at utf-8 character boundaries. I'll see
what I can come up with.
I just pushed a patch adding a function from Harry's utf8 library to
strip bad bytes.
Andi
That may not entirely fix the problem. I am not certain if preg_match
will break down if not asked to start at a proper utf8 character
boundary using offset. I am working on a fix to adjust the snippet
start and end indexes to the nearest utf8 boundary before using substr
and preg_match. That should mean that although I am dealing with byte
indexes and byte lengths, those numbers will always correspond to utf8
character boundaries.
I'll send it through once I have finished checking it.
Cheers,
Chris
--
DokuWiki mailing list - more info at
http://wiki.splitbrain.org/wiki:mailinglist
- Follow-Ups:
- [dokuwiki] Re: search improvements
- From: Chris Smith
- [dokuwiki] Re: search improvements
- From: Andreas Gohr
- References:
- [dokuwiki] search improvements
- From: Chris Smith
- [dokuwiki] Re: search improvements
- From: Andreas Gohr
- [dokuwiki] Re: search improvements
- From: Chris Smith
- [dokuwiki] Re: search improvements
- From: Andreas Gohr
Other related posts:
- » [dokuwiki] search improvements
- » [dokuwiki] Re: search improvements
- » [dokuwiki] Re: search improvements
- » [dokuwiki] Re: search improvements
- » [dokuwiki] Re: search improvements
- » [dokuwiki] Re: search improvements
- » [dokuwiki] Re: search improvements
- » [dokuwiki] Re: search improvements
- » [dokuwiki] Re: search improvements
- » [dokuwiki] Re: search improvements
- » [dokuwiki] Re: search improvements
- » [dokuwiki] Re: search improvements
- » [dokuwiki] Re: search improvements
- » [dokuwiki] Re: search improvements
- » [dokuwiki] Re: search improvements
- » [dokuwiki] Re: search improvements
- » [dokuwiki] Re: search improvements
- » [dokuwiki] Re: search improvements
- » [dokuwiki] Re: search improvements
- » [dokuwiki] Re: search improvements
- » [dokuwiki] Re: search improvements
- » [dokuwiki] Re: search improvements
- » [dokuwiki] Re: search improvements
- » [dokuwiki] Re: search improvements
- » [dokuwiki] Re: search improvements
- » [dokuwiki] Re: search improvements
- » [dokuwiki] Re: search improvements
- » [dokuwiki] Re: search improvements
- » [dokuwiki] Re: search improvements
On Sat, 26 Aug 2006 09:21:12 +0100 Chris Smith <chris@xxxxxxxxxxxxx> wrote:
Andreas Gohr wrote:
I noticed the use of some strlen calls there. Are theyYes, I think that is the best solution, adjusting the strings to
used in a UTF-8 safe way there or would it be possible that they
split a multibyte char? If that could happen we should add a check
to strip invalid UTF-8 chars from beginning and end of the snippet -
this would be a nice addition to the utf-8 lib.
ensure they always start/end at utf-8 character boundaries. I'll see
what I can come up with.
I just pushed a patch adding a function from Harry's utf8 library to strip bad bytes.
Andi
- [dokuwiki] Re: search improvements
- From: Chris Smith
- [dokuwiki] Re: search improvements
- From: Andreas Gohr
- [dokuwiki] search improvements
- From: Chris Smith
- [dokuwiki] Re: search improvements
- From: Andreas Gohr
- [dokuwiki] Re: search improvements
- From: Chris Smith
- [dokuwiki] Re: search improvements
- From: Andreas Gohr