[dokuwiki] Re: search improvements

  • From: Andreas Gohr <andi@xxxxxxxxxxxxxx>
  • To: dokuwiki@xxxxxxxxxxxxx
  • Date: Sun, 27 Aug 2006 23:37:22 +0200

On Sat, 26 Aug 2006 22:48:01 +0100
Chris Smith <chris@xxxxxxxxxxxxx> wrote:


> Also this morning when I attacked the problem again, the upshot was:
> - the main problem was aligning the start and end of the context
> snippet  with a utf-8 character boundary.

Okay. I'm not sure if I understand why this is a problem which can't be
simply solved by stripping broken multibyte sequences from start and
beginning of the context. But your clever reindex function already
solves it anyway.

> - a secondary problem is in multibyte utf-8 text the number of 
> characters returned in the snippet will be less than 100 - perhaps as 
> low as 33 or even 25 in some alphabets/writing systems.

I don't really think this is a problem. We could increase the context
size a little bit, eg. to 70 bytes and we should get enough context for
all languages.

Andi

Other related posts: