[dokuwiki] Re: UTF Normalization

From: Chris Smith <chris@xxxxxxxxxxxxx>
To: dokuwiki@xxxxxxxxxxxxx
Date: Mon, 27 Mar 2006 19:52:28 +0100

Andreas Gohr wrote:

There may also be security issues here, not already handled by dokuwiki's utf8_strip (i.e. where utf8_strip is not being used)? Validation for example?
This should be researched. Dows anyone know what eg. PHP or the preg lib does with ASCII written as Unicode codepoint? I assume it will not recognize it as a control character but I'm not sure about it...

What do you mean by this? iirc, many (most) parts of Dokuwiki don't use mb aware or utf-8 aware functions, relying on byte patterns rather than character counts.

Another question is where would non-normal form characters come from? Presumably browsers would stick to the normal form of something like a newline, but perhaps this is not the case (or if you copy and paste from another application into your browser).
As I said I assume browsers __do__ normalization when accept-charset=utf-8 is set but of course this should be tested and/or looked up in the browser sources.

Yeh, but who knows where the input has come from. If its exploitable, some one will work out away to spoof apparent utf-8 content that isn't utf-8.

Chris
--
DokuWiki mailing list - more info at
http://wiki.splitbrain.org/wiki:mailinglist

Follow-Ups:
- [dokuwiki] Re: UTF Normalization
  - From: Harry Fuecks

References:
- [dokuwiki] UTF Normalization
  - From: Harry Fuecks
- [dokuwiki] Re: UTF Normalization
  - From: Andreas Gohr

[dokuwiki] Re: UTF Normalization

Other related posts: