[dokuwiki] Re: UTF Normalization
- From: Chris Smith <chris@xxxxxxxxxxxxx>
- To: dokuwiki@xxxxxxxxxxxxx
- Date: Mon, 27 Mar 2006 19:52:28 +0100
Andreas Gohr wrote:
There may also be security issues here, not already handled by
dokuwiki's utf8_strip (i.e. where utf8_strip is not being used)?
Validation for example?
This should be researched. Dows anyone know what eg. PHP or the preg lib
does with ASCII written as Unicode codepoint? I assume it will not
recognize it as a control character but I'm not sure about it...
What do you mean by this?
iirc, many (most) parts of Dokuwiki don't use mb aware or utf-8 aware
functions, relying on byte patterns rather than character counts.
Another question is where would non-normal form characters come from?
Presumably browsers would stick to the normal form of something like a
newline, but perhaps this is not the case (or if you copy and paste
from another application into your browser).
As I said I assume browsers __do__ normalization when
accept-charset=utf-8 is set but of course this should be tested and/or
looked up in the browser sources.
Yeh, but who knows where the input has come from. If its exploitable,
some one will work out away to spoof apparent utf-8 content that isn't
utf-8.
Chris
--
DokuWiki mailing list - more info at
http://wiki.splitbrain.org/wiki:mailinglist
- Follow-Ups:
- [dokuwiki] Re: UTF Normalization
- From: Harry Fuecks
- References:
- [dokuwiki] UTF Normalization
- From: Harry Fuecks
- [dokuwiki] Re: UTF Normalization
- From: Andreas Gohr
Other related posts:
- » [dokuwiki] UTF Normalization
- » [dokuwiki] Re: UTF Normalization
- » [dokuwiki] Re: UTF Normalization
- » [dokuwiki] Re: UTF Normalization
- » [dokuwiki] Re: UTF Normalization
There may also be security issues here, not already handled by
dokuwiki's utf8_strip (i.e. where utf8_strip is not being used)?
Validation for example?
This should be researched. Dows anyone know what eg. PHP or the preg lib
does with ASCII written as Unicode codepoint? I assume it will not
recognize it as a control character but I'm not sure about it...
Another question is where would non-normal form characters come from?
Presumably browsers would stick to the normal form of something like a
newline, but perhaps this is not the case (or if you copy and paste
from another application into your browser).
As I said I assume browsers __do__ normalization when
accept-charset=utf-8 is set but of course this should be tested and/or
looked up in the browser sources.
- [dokuwiki] Re: UTF Normalization
- From: Harry Fuecks
- [dokuwiki] UTF Normalization
- From: Harry Fuecks
- [dokuwiki] Re: UTF Normalization
- From: Andreas Gohr