"Scott MacMaster" <scott@xxxxxxxxxxxxxxxxxx> wrote: > It seems to me that a proper sorting system should be independent of any > character encoding system in order for it to work well with any language. > By independent, I mean that it doesn't order the characters based on the > number each character is assigned. It places words that start with a before > b because it knows a is because b not because 97 is before 98. Yes. [suggestion snipped] Character sets are really irrelevant to sorting. As Scott pointed out they can not be used to sort, and are not designed for that purpose either. It is extremely common for character sets to be expanded and characters inserted at the end as well. I think that we do not need to worry about the encoding of the string because we can do the typical beos thing and assume that the string is UTF-8. BeOS has functions for converting to UTF-8 and from UTF-8, and if someone really wants to "fight the system", they can run their strings through those conversions every time they sort. Earlier someone mentioned that "sometimes you don't care about the accents". This is true, and that's why every concrete proposal I've seen so far includes a mechanism for specifying "how fine you want to cut it". From the opentracker cvs: enum collator_strengths { B_COLLATE_DEFAULT = -1, B_COLLATE_PRIMARY = 1, // e.g.: no diacritical differences, e = é B_COLLATE_SECONDARY, // diacritics are different from their base characters, a != ä B_COLLATE_TERTIARY, // case sensitive comparison B_COLLATE_QUATERNARY, B_COLLATE_IDENTICAL = 127 // Unicode value }; Also, since the issue of sorting is related to comparing strings, it may seem that we are not far off from thinking about queries. However, I would encourage people to focus on the one and single issue of sorting. (for this thread anyway ;-) ) There's no need to complicate matters. Queries/searches can be dealt with separately. If you really want to discuss those I recommend you to make a separate thread for it. Andrew