Wondering if Andi / anyone has an opinion on this yet. Some relevant links; http://annevankesteren.nl/2005/05/unicode http://diveintomark.org/archives/2004/07/06/nfc http://cvs.sourceforge.net/viewcvs.py/wikipedia/phase3/includes/normal/ < Mediawiki's implementation which they're using on all input Now I haven't fully grasped the issue suffice to say that ASCII characters can be represented in UTF-8 in multiple ways and also UTF-8 can be used to represent non-Unicode characters. For example a newline character could be represented with; 0x0A << the normal way - same as ASCII 0xc0 0x8A 0xe0 0x80 0x8A 0xf0 0x80 0x80 0x8A 0xf8 0x80 0x80 0x80 0x8A 0xfc 0x80 0x80 0x80 0x80 0x8A What I haven't figured out is how much of an issue is this for an application like dokuwiki? By normalizing such characters to a single representation, it could help searching for example. There may also be security issues here, not already handled by dokuwiki's utf8_strip (i.e. where utf8_strip is not being used)? Validation for example? Another question is where would non-normal form characters come from? Presumably browsers would stick to the normal form of something like a newline, but perhaps this is not the case (or if you copy and paste from another application into your browser). Anyway - just pondering out loud right now. Any thoughts appreciated. -- DokuWiki mailing list - more info at http://wiki.splitbrain.org/wiki:mailinglist