[haiku-development] Re: B_UNICODE_CONVERSION vs UTF-8

  • From: Ingo Weinhold <ingo_weinhold@xxxxxx>
  • To: haiku-development@xxxxxxxxxxxxx
  • Date: Wed, 18 Mar 2009 12:45:14 +0100

On 2009-03-17 at 20:39:50 [+0100], François Revol <revol@xxxxxxx> wrote:
> B_UNICODE_ENCODING is actually UCS-2 (or maybe UTF-16, not even
> sure...).
> 
> Vision has a special case to handle B_UNICODE_CONVERSION as UTF-8 (it
> just skips calling convert_*_utf8(), however this lets through invalid
> UTF-8 strings.
> 
> IMO we should support a B_UTF8_CONVERSION, rename B_UNICODE_CONVERSION
> to B_UCS2_CONVERSION or whichever, to avoid misunderstanding,

Sounds reasonable.

> and
> allowing the use of convert_ to also validate or eventually correct
> broken strings by converting them from ISO latin1 as fallback (seems
> ZETA's one does it when it finds broken UTF-8 as input).

Not sure about this. Reporting an error and letting the caller decide what 
other encoding to try sounds better to me than hardcoding anything. 
Alternatively (or additionally) "lenient"/"do what you can" versions of the 
conversion functions could be added.

CU, Ingo

Other related posts: