[haiku-gsoc] Re: Do we ICU or do we not ICU?

  • From: Ingo Weinhold <ingo_weinhold@xxxxxx>
  • To: haiku-gsoc@xxxxxxxxxxxxx
  • Date: Mon, 11 May 2009 23:37:13 +0200

On 2009-05-11 at 22:03:46 [+0200], Oliver Tappe <zooey@xxxxxxxxxxxxxxx> wrote:
> 
> Adrien/PulkoMandy has suggested to base parts of his work on the locale kit
> on ICU (International Components for Unicode, see icu-project.org).
> 
> We (Adrien and me) have already talked a bit about the sheer size of the
> ICU-source (58 MB unpacked) to be a problem. The resulting libraries (that 
> go
> into the haiku image) would be something around 15-17 MB. The majority of
> that is taken up by the locale data (collation information, character
> properties, charset conversion mappings, ...).

That's big, but doesn't sound too bad either.

> Before importing such a beast into our repo, I thought it would be a good
> idea to discuss where we'd like to go with ICU. As far as I can remember,
> Adrien wants to use ICU for collation stuff and for the number & date
> formatting & parsing (Adrien: please expand on this, if you can).
> I know that we already had implemented some of that functionality in the
> locale kit, but I can't remember how much was there and what is still 
> missing
> (Axel & Ingo: can you shed some light on this?).

IIRC, the number formatting was intended to work pretty much the same as in 
Java. I.e. there's an interface for number formatting for which one can get 
an instance from the current locale object. This interface is implemented by 
a generic class that can be configured via a pattern string. For most locales 
this class can be used, more complex ones need their own implementation. I 
believe the generic class was partially done, missing a few bits for floating 
point numbers, though.

Date formatting should work analogously, but I really don't recall in what 
state the code was left or if there was any at all. Shouldn't be hard to find 
out, though.

> I suppose if we decide to import ICU into our repo, it would make sense to 
> use
> it for required or existing services/APIs, i.e. to implement the POSIX 
> locale
> stuff by means of ICU, to replace the current use of libiconv with ICU's
> respective charset conversion services, to make use of ICUs regexx engine as
> a basic service, ...

Definitely. I suppose a prerequisite would be that we sort out our wchar 
support. Before seriously messing with it, it's probably a good idea to 
switch our wchar_t to 32 bit.

> There are many more features of ICU that could be used by haiku in the
> future, for instance the text/char iterator classes that could be used by
> BTextView to do proper wordwise navigation and word wrapping.
> There's even (font-engine agnostic) textlayout-engine for bringing the more
> complicated scripts on screen.

Sounds nice. :-)

> I guess what I want to do with this mail is to get a discussion started 
> about
> if using ICU makes sense at all and, if so, which parts are *required* for
> the locale kit and thus should be targeted first?

I really don't know anything about ICU besides that it is a rather complete 
solution to the localization problem. Assuming that those 15-17 MB of 
libraries wouldn't become part of libroot (or be loaded completely) it 
certainly doesn't sound unreasonable to use ICU, if the interfaces it 
provides are OK.

CU, Ingo

Other related posts: