[openbeos] Re: Sorting & character sets

  • From: "Andrew Bachmann" <shatty@xxxxxxxxxxxxx>
  • To: openbeos@xxxxxxxxxxxxx
  • Date: Mon, 22 Dec 2003 19:14:21 -0800 PST

"Scott MacMaster" <scott@xxxxxxxxxxxxxxxxxx> wrote:
> It seems to me that a proper sorting system should be independent of any
> character encoding system in order for it to work well with any language.
> By independent, I mean that it doesn't order the characters based on the
> number each character is assigned.  It places words that start with a before
> b because it knows a is because b not because 97 is before 98.

[suggestion snipped]

Character sets are really irrelevant to sorting.  As Scott pointed out they
can not be used to sort, and are not designed for that purpose either.  It is
extremely common for character sets to be expanded and characters
inserted at the end as well.

I think that we do not need to worry about the encoding of the string
because we can do the typical beos thing and assume that the string is
UTF-8.  BeOS has functions for converting to UTF-8 and from UTF-8,
and if someone really wants to "fight the system", they can run their
strings through those conversions every time they sort.

Earlier someone mentioned that "sometimes you don't care about the
accents".  This is true, and that's why every concrete proposal I've seen
so far includes a mechanism for specifying "how fine you want to cut it".
From the opentracker cvs:

enum collator_strengths {
        B_COLLATE_DEFAULT = -1,

        B_COLLATE_PRIMARY = 1,          // e.g.: no diacritical differences, e 
= é
        B_COLLATE_SECONDARY,            // diacritics are different from their 
base characters, a 
!= ä
        B_COLLATE_TERTIARY,                     // case sensitive comparison

        B_COLLATE_IDENTICAL = 127       // Unicode value

Also, since the issue of sorting is related to comparing strings, it may
seem that we are not far off from thinking about queries.  However, I
would encourage people to focus on the one and single issue of sorting.
(for this thread anyway ;-) )  There's no need to complicate matters.
Queries/searches can be dealt with separately.  If you really want to
discuss those I recommend you to make a separate thread for it.


Other related posts: