[openbeos] Re: AW: Re: AW: Locale Kit

  • From: Olivier MILLA <methedras@xxxxxxxxx>
  • To: openbeos@xxxxxxxxxxxxx
  • Date: Tue, 16 Dec 2003 11:36:26 +0100

On 2003-12-15 at 14:05:08 [+0100], you wrote:
> Pascal Goguey <pascal@xxxxxxxxxx> wrote:
> > In case of french, if I understood Axel's method correctly,
> > well, let's take an example: illettré, île, iliaque cannot be
> > sorted by a plain sort function because î is outside of ASCII,
> > and therefore greater than any of the other letters. The regular
> > sort would put île after zythum.
> 
> Right, that's the ASCII only problem.
> 
> > So the proposed method (apparently)  consists in first stripping
> > these strings to temporary ascii strings, sorting, and then ordering
> > the
> > original strings in the same order.
> > 
> > But there is a logical mistake here. Let's call:
> > Strip: a function that removes accents and alike.
> > A_Order : ascii order
> > F_Order : french dictionary order
> > 
> > A_Order ( strip (s1) , strip ( s2 ) ) can be deduced from F_Order
> > (s1,
> > s2)
> > BUT:
> > F_Order(s1, s2) cannot be deduced from A_Order( strip(s1), strip(s2))
> > 
> > Here is an example:
> > 
> > These two words : cote and côte should happen in this sequence.
> > côte should be after cote.
> > 
> > If you perform an ASCII sort of the stripped strings, you end up
> > sorting cote and cote, and since the strings are equal, you cannot
> > decide which of the original strings comes first. No surprise here,
> > you loose information by stripping.
> > It's a good quick approximation, but not a fully working method.
> 
> It's fully working for many languages, but you can easily extend it to
> do what you what it to do. The current implementation just translates
> "à" to "a", for example. It could also do something like:
>     "a" -> "a0"
>     "á" -> "a1"
>     "à" -> "a2"
>     "â" -> "a3"
> 
> The current implementation allows to compare strings as is, but also to
> get the string that represents its order and allows for direct memcmp()
> or strcmp() of two strings.
> Also, we need to differentiate between the primary and secondary
> collation level. The primary should not differentiate between "a" and
> "á" while the secondary should. I will have to recheck about how
> exactly this is done in other localisation efforts, though (currently,
> I have implemented the German telephone book order to change the
> primary level; I am not sure this is correct).

Well, I guess we simply (at least for french) just need to sort depending 
the "number of changes".
for instance, to change "côte" to ASCII you have 1 change (ô => o)
so if you have to compare it to "cote" it should be after.

Now, I guess another problem is to compare "été" with "ètè" (even if the 
second doesnt exist. It's an example).
Both get 2 changes to enter ASCII code. Here you ahve to set an internal 
order which you then use as there is no real order.

my 0.002 cents,

Olivier
-- 
"A man does what he does because he sees the world as he sees it" A.K

Other related posts: