[dokuwiki] syntax plugin bug? (unicode range regexp)

  • From: Michiel Kamermans <pomax@xxxxxxxxxxxxxxxxxxxx>
  • To: dokuwiki@xxxxxxxxxxxxx
  • Date: Wed, 21 Oct 2009 05:02:20 -0700

Hi

I'm trying to write a syntax plugin to replace ([\x{4E00}-\x{9FFFF}\x{3005}\x{30F6}]+)\(([\x{3040}-\x{30FF}]+)\) with something sensible, but it would appears that this regexp makes the syntax plugin system screw up big time. Even something as simple as:

function connectTo($mode) { $this->Lexer->addSpecialPattern("[\x{4E00}-\x{9FFFF}]+", $mode, 'plugin_myplugin'); }

seems to completely kill off any and all syntax parsing, instead showing me the plain unparsed document text. Is it possible that the lexer doesn't take unicodeness into account? Could it be that the pattern matching is missing the 'u' pattern modifier? (which makes the pattern matching use PCRE8 parsing, which treats all strings as utf8 strings, rather than a series of bytes).

- Mike "Pomax" Kamermans
nihongoresources.com
--
DokuWiki mailing list - more info at
http://wiki.splitbrain.org/wiki:mailinglist

Other related posts: