On 9/15/05, Chris Smith <chris@xxxxxxxxxxxxx> wrote: > > > > - if I change my dev install to use '\b' I get the same results as > > > splitbrain. If I use current boundary setting I get same results as > > > my reference install. > > > > It originally used \b as boundary but this made problems with UTF-8 > > words. > > In looking into this I notice the perl flags used by the lexer don't include > "u" for utf-8. I guess that is deliberate but I thought I'd check to make > sure :) Well spotted. Hmmm - that might be something to screen the existing parser modes for - without the /u flag (which isn't used at this time) certain regex behaviour is controlled by your server's locale settings e.g. what is regarded as a word character. It may be possible to add the /u flag (in lexer.php - look for "perlMatchingFlags") but it might break things - having the unit tests there would be important. I guess there will be very few cases though where it's a problem - might be easier to review the the existing modes and the regexes they use. -- DokuWiki mailing list - more info at http://wiki.splitbrain.org/wiki:mailinglist