[openbeos] Re: AW: Re: AW: Locale Kit

  • From: Gabe Yoder <gyoder@xxxxxxxxxxx>
  • To: openbeos@xxxxxxxxxxxxx
  • Date: Tue, 16 Dec 2003 17:22:09 -0500

On Tuesday 16 December 2003 05:36 am, you wrote:
> On 2003-12-15 at 14:05:08 [+0100], you wrote:
> > Pascal Goguey <pascal@xxxxxxxxxx> wrote:
> > > In case of french, if I understood Axel's method correctly,
> > > well, let's take an example: illettré, île, iliaque cannot be
> > > sorted by a plain sort function because î is outside of ASCII,
> > > and therefore greater than any of the other letters. The regular
> > > sort would put île after zythum.
> >
> > Right, that's the ASCII only problem.
> >
> > > So the proposed method (apparently)  consists in first stripping
> > > these strings to temporary ascii strings, sorting, and then ordering
> > > the
> > > original strings in the same order.
> > >
> > > But there is a logical mistake here. Let's call:
> > > Strip: a function that removes accents and alike.
> > > A_Order : ascii order
> > > F_Order : french dictionary order
> > >
> > > A_Order ( strip (s1) , strip ( s2 ) ) can be deduced from F_Order
> > > (s1,
> > > s2)
> > > BUT:
> > > F_Order(s1, s2) cannot be deduced from A_Order( strip(s1), strip(s2))
> > >
> > > Here is an example:
> > >
> > > These two words : cote and côte should happen in this sequence.
> > > côte should be after cote.
> > >
> > > If you perform an ASCII sort of the stripped strings, you end up
> > > sorting cote and cote, and since the strings are equal, you cannot
> > > decide which of the original strings comes first. No surprise here,
> > > you loose information by stripping.
> > > It's a good quick approximation, but not a fully working method.
> >
> > It's fully working for many languages, but you can easily extend it to
> > do what you what it to do. The current implementation just translates
> > "à" to "a", for example. It could also do something like:
> >     "a" -> "a0"
> >     "á" -> "a1"
> >     "à" -> "a2"
> >     "â" -> "a3"
> >
> > The current implementation allows to compare strings as is, but also to
> > get the string that represents its order and allows for direct memcmp()
> > or strcmp() of two strings.
> > Also, we need to differentiate between the primary and secondary
> > collation level. The primary should not differentiate between "a" and
> > "á" while the secondary should. I will have to recheck about how
> > exactly this is done in other localisation efforts, though (currently,
> > I have implemented the German telephone book order to change the
> > primary level; I am not sure this is correct).
>
> Well, I guess we simply (at least for french) just need to sort depending
> the "number of changes".
> for instance, to change "côte" to ASCII you have 1 change (ô => o)
> so if you have to compare it to "cote" it should be after.
>
> Now, I guess another problem is to compare "été" with "ètè" (even if the
> second doesnt exist. It's an example).
> Both get 2 changes to enter ASCII code. Here you ahve to set an internal
> order which you then use as there is no real order.
>

Spanish provides some interesting problems.  I am not sure how they handle 
comparisons with accents (a vowel with an accent is not a different letter, 
it just changes which syllable gets the emphasis).  The n with a ~ (I don't 
know how to type it correctly), is a different letter than n and should 
follow n.  Those quirks are not much different than the quirks discussed for 
the other languages.  The weird part is that "LL" and "RR" are each 
considered a single letter and follow their single counter parts (so "llamar" 
comes after "luz").


Other related posts: