[haiku-bugs] Re: [Haiku] #13184: Infinite loop with bash/readline/ICU on non-BMP Unicode characters

From: "jessicah" <trac@xxxxxxxxxxxx>
Date: Wed, 11 Jan 2017 16:35:27 -0000

#13184: Infinite loop with bash/readline/ICU on non-BMP Unicode characters
-------------------------------+----------------------------
   Reporter:  jessicah         |      Owner:  pulkomandy
       Type:  bug              |     Status:  new
   Priority:  normal           |  Milestone:  Unscheduled
  Component:  Kits/Locale Kit  |    Version:  R1/Development
Resolution:                   |   Keywords:
Blocked By:                   |   Blocking:
Has a Patch:  1                |   Platform:  All
-------------------------------+----------------------------

Comment (by jessicah):

Mm, I added some more tracing to my modified version; indeed, using
`ucnv_getNextUChar()` returns the single value 150370 instead, and source
length is still 4.

I'm not sure if using `ucnv_getNextUChar()` is the right fix here though.
Not sure this would work correctly on invalid sequences, as required by
`mbrtowc`.

Also, I've noticed in `WcharToMultibyte()` that we a) convert wchar_t from
UTF-32 to UTF-16, and then operate on UTF-16 for doing the actual
conversion.

So, if I'm understanding `WcharToMultibyte()` correctly, then apparently
our wchar_t is indeed UTF-32, not UTF-16, which means our
`MultibyteToWchar` should handle UTF-16 surrogate pairs correctly. Seeing
how `WcharToMultibyte()` is implemented, I think I may be able to provide
a proper patch :-)

--
Ticket URL: <https://dev.haiku-os.org/ticket/13184#comment:8>
Haiku <https://dev.haiku-os.org>
Haiku - the operating system.

References:
- [haiku-bugs] [Haiku] #13184: Infinite loop with bash/readline/ICU on non-BMP Unicode characters
  - From: jessicah

[haiku-bugs] Re: [Haiku] #13184: Infinite loop with bash/readline/ICU on non-BMP Unicode characters

Other related posts: