[openbeos] Re: AW: Re: AW: Locale Kit

  • From: Olivier MILLA <methedras@xxxxxxxxx>
  • To: openbeos@xxxxxxxxxxxxx
  • Date: Tue, 16 Dec 2003 11:36:26 +0100

On 2003-12-15 at 14:05:08 [+0100], you wrote:
> Pascal Goguey <pascal@xxxxxxxxxx> wrote:
> > In case of french, if I understood Axel's method correctly,
> > well, let's take an example: illettré, île, iliaque cannot be
> > sorted by a plain sort function because î is outside of ASCII,
> > and therefore greater than any of the other letters. The regular
> > sort would put île after zythum.
> Right, that's the ASCII only problem.
> > So the proposed method (apparently)  consists in first stripping
> > these strings to temporary ascii strings, sorting, and then ordering
> > the
> > original strings in the same order.
> > 
> > But there is a logical mistake here. Let's call:
> > Strip: a function that removes accents and alike.
> > A_Order : ascii order
> > F_Order : french dictionary order
> > 
> > A_Order ( strip (s1) , strip ( s2 ) ) can be deduced from F_Order
> > (s1,
> > s2)
> > BUT:
> > F_Order(s1, s2) cannot be deduced from A_Order( strip(s1), strip(s2))
> > 
> > Here is an example:
> > 
> > These two words : cote and côte should happen in this sequence.
> > côte should be after cote.
> > 
> > If you perform an ASCII sort of the stripped strings, you end up
> > sorting cote and cote, and since the strings are equal, you cannot
> > decide which of the original strings comes first. No surprise here,
> > you loose information by stripping.
> > It's a good quick approximation, but not a fully working method.
> It's fully working for many languages, but you can easily extend it to
> do what you what it to do. The current implementation just translates
> "à" to "a", for example. It could also do something like:
>     "a" -> "a0"
>     "á" -> "a1"
>     "à" -> "a2"
>     "â" -> "a3"
> The current implementation allows to compare strings as is, but also to
> get the string that represents its order and allows for direct memcmp()
> or strcmp() of two strings.
> Also, we need to differentiate between the primary and secondary
> collation level. The primary should not differentiate between "a" and
> "á" while the secondary should. I will have to recheck about how
> exactly this is done in other localisation efforts, though (currently,
> I have implemented the German telephone book order to change the
> primary level; I am not sure this is correct).

Well, I guess we simply (at least for french) just need to sort depending 
the "number of changes".
for instance, to change "côte" to ASCII you have 1 change (ô => o)
so if you have to compare it to "cote" it should be after.

Now, I guess another problem is to compare "été" with "ètè" (even if the 
second doesnt exist. It's an example).
Both get 2 changes to enter ASCII code. Here you ahve to set an internal 
order which you then use as there is no real order.

my 0.002 cents,

"A man does what he does because he sees the world as he sees it" A.K

Other related posts: