[dokuwiki] Re: UTF Normalization

On Tue, 21 Mar 2006 16:01:31 +0100
"Harry Fuecks" <hfuecks@xxxxxxxxx> wrote:

> Wondering if Andi / anyone has an opinion on this yet.

> What I haven't figured out is how much of an issue is this for an
> application like dokuwiki?

Good question. I assume all browsers will submit normalized UTF-8 anyway
so ususally text entered the normal way will not need it...

...which leads to
 
> There may also be security issues here, not already handled by
> dokuwiki's utf8_strip (i.e. where utf8_strip is not being used)?
> Validation for example?

This should be researched. Dows anyone know what eg. PHP or the preg lib
does with ASCII written as Unicode codepoint? I assume it will not
recognize it as a control character but I'm not sure about it...
 
> Another question is where would non-normal form characters come from?
> Presumably browsers would stick to the normal form of something like a
> newline, but perhaps this is not the case (or if you copy and paste
> from another application into your browser).

As I said I assume browsers __do__ normalization when
accept-charset=utf-8 is set but of course this should be tested and/or
looked up in the browser sources.

Andi

Other related posts: