| -----Original Message----- | From: openbeos-bounce@xxxxxxxxxxxxx | [mailto:openbeos-bounce@xxxxxxxxxxxxx]On Behalf Of Simon Taylor | Sent: 20 December 2003 10:40 | To: openbeos@xxxxxxxxxxxxx | Subject: [openbeos] Re: AW: Re: AW: Locale Kit | | | > > > Another things to consider is gender. If I do a search on 'este' | > > > (this, | > > > masculine) I probably would want matches on 'esta' (this, | > > > feminine). | > > > On the | > > > other hand 'este' also means east and for that meaning it has no | > > > gender. | > > > Maybe it would be best to have an 'ignore gender' | checkbox on the | > > > search | > > > panel. Hopefully, no one thinks we're suggesting that people | > > > ignore | > > > men or | > > > women and sues us for being discrimiative. :) | > > | > > You're mixing completely different topics here. Collation only | > > refers | > > to character match and order - it (usually, some | languages define a | > > sort order depending on words as well) has nothing to do | with words | > > or | > > anything in that regard. | > > To implement something like you want, you would need to | > > morphologically | > > analyze the text, and you'd most often need a complete lexicon of | > > the | > > language to do that. After this, you would use the collation | > > services | > > to see if the words match (in their morphologically reduced form). | > | > I am aware that the one algorithm would be significantly | more complex | > but I | > wouldn't say they are completely different. They are both searches | > with a | > set of rules that differ somewhat. When the topic of searching was | > mentioned my mind first went to the idea of searching for text in a | > document. However, for other types of searches, possibly searching | > for a | > file, a more exact search algorithm may be better. Still, | I think it | > would | > be somewhat unintuitive to type "este or esta" in the search string | > if I'm | > not certain what gender the filename uses. | | It would be much more unintuitive and confusing, IMHO, if you | query for | "esta" and "filenamecontainseste" pops up in the results. So much so, | that it would look like a bug to me. | | One of the things I like about using BeOS is that it is | obvious exactly | what is happening. Using windows (and especially office) often feels | like "do one thing (eg paste) and I'll randomly pick exactly how you | don't want your table formatted for you, rewrap the rest of the | document, decide that the correct grammar sentence you told me to | ignore a minute ago is incorrect again, and one or two other things | that I've never done before". I much prefer interfaces that are | predictable. | | Whether in code, å == a in strcmp-type functions, I don't know. Maybe | this should just be added to the query thing (in the same way that | typing "a" creates a formula "[aA]") | Don't quite get this - what has gender to do with sort order ? There's enough problems with 'international' sorting without adding more ! Let's face it, it is only when you can completely ignore gender - make it truely irrelevant - that you have true equality (but I can't imagine why you'd want that). What appears to be the fly in the ointment in this case is those strange little 'accent' thingies some of the southern european languages seem to need. Are they really necessary, or can they be ignored ? - and why are they needed anyway when languages such as English, American and Australian get along fine without them. Unfortunately, the ASCII doesn't cater for accents because at the time it was invented Americans didn't use accents (rumour has it that they still don't). It seems that all issues of sorting are going to need to use either a two-byte character set or a system of 'value tags' unless the accents can be ignored. In terms of planning for the future - does it really matter ? Languages have a habit of dying out with disuse: The Gaelic languages are gradually dying out from the British Isles, Cornish died a couple of decades ago, only a few sheep now speak Welsh - and a handfull of highlanders the old Scot's Gaelic ('Gallic' they call it). Across most of the (civilised) world, English has become the dominant language in business and commercial life. Minor languages like French, Spanish and Italian will probably become irrelevant soon. Most of India seems to be learning Enlish so they can pinch our jobs, so in a few decades there'll just be English and Chinese. But the Chinese are learning English, so give it a generation and only English will stand as an important everyday language - leaving Latin for scientific and legal use, and other languages as folk curiousities. Might as well just sort in ASCII and wait patiently while the rest of the world falls into line.