[dokuwiki] Re: UTF Normalization

  • From: Chris Smith <chris@xxxxxxxxxxxxx>
  • To: dokuwiki@xxxxxxxxxxxxx
  • Date: Mon, 27 Mar 2006 19:52:28 +0100

Andreas Gohr wrote:

There may also be security issues here, not already handled by
dokuwiki's utf8_strip (i.e. where utf8_strip is not being used)?
Validation for example?

This should be researched. Dows anyone know what eg. PHP or the preg lib
does with ASCII written as Unicode codepoint? I assume it will not
recognize it as a control character but I'm not sure about it...
What do you mean by this?
iirc, many (most) parts of Dokuwiki don't use mb aware or utf-8 aware functions, relying on byte patterns rather than character counts.
Another question is where would non-normal form characters come from?
Presumably browsers would stick to the normal form of something like a
newline, but perhaps this is not the case (or if you copy and paste
from another application into your browser).

As I said I assume browsers __do__ normalization when
accept-charset=utf-8 is set but of course this should be tested and/or
looked up in the browser sources.
Yeh, but who knows where the input has come from. If its exploitable, some one will work out away to spoof apparent utf-8 content that isn't utf-8.

Chris
--
DokuWiki mailing list - more info at
http://wiki.splitbrain.org/wiki:mailinglist

Other related posts: