[openbeos] Re: [just On Topic, with no flames] Was: Locale Kit

  • From: "Scott MacMaster" <scott@xxxxxxxxxxxxxxxxxx>
  • To: <openbeos@xxxxxxxxxxxxx>
  • Date: Tue, 23 Dec 2003 23:54:33 -0500

From: <kevin.lawton@xxxxxx>
Scott MacMaster opined:
 <snip>
| It seems to me that a proper sorting system should be independent of any
| character encoding system in order for it to work well with any language.
| By independent, I mean that it doesn't order the characters based on the
| number each character is assigned.  It places words that start with a
before
| b because it knows a is because b not because 97 is before 98.
While the might seem like some ideal worth striving for, it ignores the
fundamental point that computers are essentially numerical machines. They
know nothing of 'a' or 'b' - only numbers and how to perform arithmetic and
logical operations on them. In order for a computer to understand 'a' and
'b' they must each first be assigned a numeric value. There are a few
different operations a computer can use in order to sort numbers into
ascending or descending order, but the concept of subtraction and then
testing the result to be positive or negative is typical. The sort could be
performed on the numbers used to encode the character set, but that is
unlikely to yield desirable results. The most obvious method is to 'tag'
each character with a number which represents its position in the desired
sort order. To do this takes up some processing time, but fortunately modern
machines appear to have this in abundance. If the tag values assigned were
varied according to the sort locale, then this would be one method of
achieving a locale-dependant sort.

I'm aware that computers are essentially numerical machines.  When I said
the language module would place 'a' before 'b' because it know that they
are, I meant that it would know this through logic or programming.  While
writing this I was thinking of direct comparisons.  For example,

void Sort(char chars[2], char sortedChars[2])
{
    if(chars[0] == 'b' && chars[1] == 'a')
    {
        sortedChars[0] = 'a';
        sortedChars[1] = 'b';
    }
    else
    {
        sortedChars[0] = 'a';
        sortedChars[1] = 'b';
    }
}

Obviously, such an algorithm wouldn't be very efficient.  Using tags, as you
suggested, would probably be more efficient.  They may even be a more
efficient method.  However, that really wasn't my point.  My point is that
sorting shouldn't use the numbers used to encode the characters.
Apparently, you agree.


Later,
Scott MacMaster


Other related posts: