[haiku-gsoc] Re: Do we ICU or do we not ICU?

  • From: Ryan Leavengood <leavengood@xxxxxxxxx>
  • To: haiku-gsoc@xxxxxxxxxxxxx
  • Date: Mon, 11 May 2009 17:04:38 -0400

On Mon, May 11, 2009 at 4:03 PM, Oliver Tappe <zooey@xxxxxxxxxxxxxxx> wrote:
>
> Adrien/PulkoMandy has suggested to base parts of his work on the locale kit
> on ICU (International Components for Unicode, see icu-project.org).

If it helps the discussion, we will also need ICU for WebKit (though
it probably could be factored out if we really did not want ICU on
Haiku.) But in my mind it makes sense to use all that code (for both
WebKit and Haiku i18n), which is not trivial to create properly.

> We (Adrien and me) have already talked a bit about the sheer size of the
> ICU-source (58 MB unpacked) to be a problem. The resulting libraries (that go
> into the haiku image) would be something around 15-17 MB. The majority of
> that is taken up by the locale data (collation information, character
> properties, charset conversion mappings, ...).

The source size could be a problem (more about that below), but I
don't think 15-17 MB for the libraries is that bad, considering all
that it contains. I know we want Haiku to be lean and mean, but with
other modern operating systems having MULTI-GIGABYTE installations, I
don't think we should get too concerned about 17 MB.

> Before importing such a beast into our repo, I thought it would be a good
> idea to discuss where we'd like to go with ICU.

I am strongly against importing big external libraries into the Haiku
source tree. There are plenty of tools now for linking external
libraries into an existing source tree. Some options for SVN are
svn:externals or Piston (http://piston.rubyforge.org/). We are surely
not the only project that needs to use a big library like ICU as a
core component, but don't want to put that code in their source
control. Let's see what the options are.

Once we figure out something nice I would suggest maybe setting up
some other components like this (Mesa for one, probably several other
things.) The only things I would not move out is anything that is
deeply embedded or heavily modified (AGG and libc come to mind.)

> I suppose if we decide to import ICU into our repo, it would make sense to use
> it for required or existing services/APIs, i.e. to implement the POSIX locale
> stuff by means of ICU, to replace the current use of libiconv with ICU's
> respective charset conversion services, to make use of ICUs regexx engine as
> a basic service, ...

Like I said I don't think we need to or should import ICU into the
Haiku source tree. But I still think we could use it for all the
above. I certainly would like a built-in regex engine (as long as we
can build a nice friendly Haiku API wrapper for it.)

> There are many more features of ICU that could be used by haiku in the
> future, for instance the text/char iterator classes that could be used by
> BTextView to do proper wordwise navigation and word wrapping.
> There's even (font-engine agnostic) textlayout-engine for bringing the more
> complicated scripts on screen.

Yes all this could definitely be good things to use ICU for.

> I guess what I want to do with this mail is to get a discussion started about
> if using ICU makes sense at all and, if so, which parts are *required* for
> the locale kit and thus should be targeted first?

If it helps I was able to compile ICU 2.6 for WebKit back in 2007. I
believe others have done so since, so there may not be much "porting"
required. But obviously wrapping all that functionality in nice Haiku
kits is the challenge. By the way, there are some tricks that may be
required to cross-compile ICU, though that may have been fixed since I
had to compile it. I don't know if Adrien is doing this work from
within Haiku or from a cross-compile setup...

Regards,
Ryan

Other related posts: