[openbeos] Re: [sort of OT, flame-ish] Re: Re: AW: Re: AW: Locale Kit

  • From: "Scott MacMaster" <scott@xxxxxxxxxxxxxxxxxx>
  • To: <openbeos@xxxxxxxxxxxxx>
  • Date: Mon, 22 Dec 2003 21:49:48 -0500

> > I now need to learn Unicode, don't I. I'll have a google.
> > So I guess the question now comes down to whether all
> > the characters and all the accents used in all languages
> > sort in the correct order in Unicode.
> UNICODE is a character encoding system. You're not supposed to learn it
> more than you're supposed to learn how floating point works in ones and
> zeros. There are multiple standards within unicode but IIRC the original
> 16 bits. BeOS makes use of an intermediate standard called UTF8 where
> characters remain single-byte, while characters not supported by ASCII are
> represented by 3 bytes. This way you remain sort of compatible with ASCII.

I agree UNICODE is a character encoding system.  So is ASCII and other
encoding schemes.  It seems inappropriate to even use them as a basis for
sorting.  Even ASCII which focuses on English can't be used to sort English
properly.  Without making some initial checks you'll find that Z comes
before a.

It seems to me that a proper sorting system should be independent of any
character encoding system in order for it to work well with any language.
By independent, I mean that it doesn't order the characters based on the
number each character is assigned.  It places words that start with a before
b because it knows a is because b not because 97 is before 98.

What I propose is a set of functions that receive characters in a specified
encoding.  Then, depending on the current language, a function would be
called to order the characters according to the current language.  What I
envision, is something like a driver.  A language plug-in (i.e. a driver)
would be placed in a language folder (i.e. driver folder).  The language
plug-in would implement functions that the system could call to get
information about the language so the user can select the language from a
language preferences window.  The language plug-in would also contain
functions to do sorting, searching, or other useful language specific
operations.  If a character is passed that doesn't belong to the current
language the plug-in, depending on the operation, could ignore the
character, return an error, or try to do something with it.

Scott MacMaster

Other related posts: