[haiku-bugs] Re: [Haiku] #13184: Infinite loop with bash/readline/ICU on non-BMP Unicode characters

  • From: "jessicah" <trac@xxxxxxxxxxxx>
  • Date: Wed, 11 Jan 2017 16:35:27 -0000

#13184: Infinite loop with bash/readline/ICU on non-BMP Unicode characters
-------------------------------+----------------------------
   Reporter:  jessicah         |      Owner:  pulkomandy
       Type:  bug              |     Status:  new
   Priority:  normal           |  Milestone:  Unscheduled
  Component:  Kits/Locale Kit  |    Version:  R1/Development
 Resolution:                   |   Keywords:
 Blocked By:                   |   Blocking:
Has a Patch:  1                |   Platform:  All
-------------------------------+----------------------------

Comment (by jessicah):

 Mm, I added some more tracing to my modified version; indeed, using
 `ucnv_getNextUChar()` returns the single value 150370 instead, and source
 length is still 4.

 I'm not sure if using `ucnv_getNextUChar()` is the right fix here though.
 Not sure this would work correctly on invalid sequences, as required by
 `mbrtowc`.

 Also, I've noticed in `WcharToMultibyte()` that we a) convert wchar_t from
 UTF-32 to UTF-16, and then operate on UTF-16 for doing the actual
 conversion.

 So, if I'm understanding `WcharToMultibyte()` correctly, then apparently
 our wchar_t is indeed UTF-32, not UTF-16, which means our
 `MultibyteToWchar` should handle UTF-16 surrogate pairs correctly. Seeing
 how `WcharToMultibyte()` is implemented, I think I may be able to provide
 a proper patch :-)

--
Ticket URL: <https://dev.haiku-os.org/ticket/13184#comment:8>
Haiku <https://dev.haiku-os.org>
Haiku - the operating system.

Other related posts: