[haiku-development] Re: B_UNICODE_CONVERSION vs UTF-8
- From: Ingo Weinhold <ingo_weinhold@xxxxxx>
- To: haiku-development@xxxxxxxxxxxxx
- Date: Wed, 18 Mar 2009 12:45:14 +0100
On 2009-03-17 at 20:39:50 [+0100], François Revol <revol@xxxxxxx> wrote:
> B_UNICODE_ENCODING is actually UCS-2 (or maybe UTF-16, not even
> sure...).
>
> Vision has a special case to handle B_UNICODE_CONVERSION as UTF-8 (it
> just skips calling convert_*_utf8(), however this lets through invalid
> UTF-8 strings.
>
> IMO we should support a B_UTF8_CONVERSION, rename B_UNICODE_CONVERSION
> to B_UCS2_CONVERSION or whichever, to avoid misunderstanding,
Sounds reasonable.
> and
> allowing the use of convert_ to also validate or eventually correct
> broken strings by converting them from ISO latin1 as fallback (seems
> ZETA's one does it when it finds broken UTF-8 as input).
Not sure about this. Reporting an error and letting the caller decide what
other encoding to try sounds better to me than hardcoding anything.
Alternatively (or additionally) "lenient"/"do what you can" versions of the
conversion functions could be added.
CU, Ingo
Other related posts: