[greenstone_pt] Re: About a multilingual prototype

  • From: rafael.antonio@xxxxxxx
  • To: greenstone_pt@xxxxxxxxxxxxx, Claudia Wanderley <cmwanderley@xxxxxxxxx>
  • Date: Thu, 19 Feb 2009 13:09:39 +0000


To be honest I do not understand what you want about multilingualism.
I know a little about Angola and Cabo Verde but they seem very diferent.
Angola has a portuguese oficial language and a lot of dialects but only used 
for each comunity.
Cabo Verde has too portuguese as oficial language and crioulo has a second 
language, sometimes the main comunication language.
May be Brasil has some similarities.
If you want to share digital documents (what GreenStone does) may be the only 
common language is portuguese.
Why this interest on what you call multilingualism???


Citando Claudia Wanderley <cmwanderley@xxxxxxxxx>: 

> Dear John,
> our subject is multilingualism. You are right. And we are in countries that 
> speak Portuguese. In fact, we have reached for you to comprehend 
> multilingualism possibilities in Greenstone.
> There are strong local languages in Portuguese speaking countries, called 
> national languages, minor languages, with considerable amount of cultural 
> production. Also, it is important to say, these languages lives don`t fit in 
> western model of "language and literature", they`re not a closed system. A 
> same person in Angola, for instance, can speak four languages in a day, for 
> different activities at home, in the office, with the familiy, on the 
> market... So the idea of a digital library that could work metaphorically in 
> such linguistic practice would do a better job to include our local languages 
> production in digital world. Beacuse it is necessary to able to shift from 
> one language to another, as the speaker does. 
> We could be both lists? Because we are both of them. We`re debating for 
> Multilingualism, and we`re starting to do it in Portuguese speaking 
> countries. I just think it`s important not to "erase" multilingualism debate 
> from the portuguese list. Should we open another one? Do you have an ongoing 
> list on multilingualism?
> And yes, it would be wonderful to have specialists for a general discussion 
> on multilingualism. Perfect.
> Best,
> Claudia
> 2009/2/19 John Rose <john.rose1@xxxxxxx[1]>
>  Dear Claudia,
> I am a bit confused. I thought that the subject of this list was to discuss 
> (in Portuguese) the evaluation/improvement, promotion and use of Greenstone 
> in Portuguese speaking countries (including use of local languages in those 
> countries) and to provide help to users with questions/problems.
> If we want to have a general discussion on mulitlingualism in digital 
> libraries, then perhaps we should have another list for this, in which we 
> would invite participants worldwide who are interested in this problem. I 
> guess that in such a discussion the contributions would probably be in 
> English to ensure maximum mutual understanding.
> Coming back to Chinese (but not sure why Nadia has been focusing on this, 
> rather than for example on Arabic or Russian which like Chinese are UNESCO 
> languages using non-Latin characters and with full operational Greenstone 
> interfaces. I don't think that the problem of pinyin versus Chinese ideograms 
> is so fundamentally different from correctly transliterating Arabic or 
> Russian into Latin script (of course Chinese is more complicated since there 
> is I believe not always a unique mapping between a pinyin phoneme, even with 
> the tone indicated, and the corresponding Chinese ideogram, but some 
> ambiguities exist in almost all transliteration schemes - as well as the 
> problem that many scholarly works, especially older ones, use non-standard or 
> alternative transliteration schemes). Greenstone has no special functionality 
> to support double use of a language - in its native character form and in 
> transliterated form. This could be interesting for linguistic scholars but 
> the vast majority of speakers of a language would want to access information 
> in their native character set, not through transliterated characters. It 
> would technically be possible to provide a pinyin user interface and also to 
> search on metadata and/or full text in pinyin or ideograms or even (I believe 
> but not certain) mixed combinations, but I have not seen an example of this 
> sort of specialized linguistic DL application.
> Greenstone is trying to provide, evaluate and maintain the largest number 
> possible of language interfaces. Because of the immense amount of work 
> involved, and the importance of having users take responsibility for deciding 
> which languages to use, all of the language interface work is undertaken by 
> volunteer translators.
> Hope this clarifies, perhaps it would be best to move the discussion on 
> Chinese to individual correspondence if you want to proceed? Our Chinese 
> specialist Anna Huang is receiving this message and could perhaps provide any 
> further advice which she might have on this specific subject directly to you 
> and Nadia. Best regards, John    
> At 02:43 19/02/2009, you wrote:
> Dears,
> as a linguist, not understanding very well what you`re talking about, if we 
> put the chinese data in - Nadia, I found the name - pinyin (the romanization 
> of mandarin ), could it work? 
> Meaning, is it possible to build the chinese data in both systems, pinyin and 
> chinese ideograms, in a way that they are equivalent for this system? Is this 
> GLI translation capable of inter/trans-characters translations, or better is 
> there transliteration availability?
> Best,
> Claudia
> 2009/2/18 John Rose <john.rose1@xxxxxxx>
>  Dear Nadia,
>  I thought we were supposed to be speaking in Portuguese on this list (except 
> for me) (-:
>  There are 4 different aspects to the language interface: i) the spreadsheets 
> you have to translate the user interface, ii) translations of the metadata 
> names (there is a facility in GLI for translation of terms which are not 
> already included in the metadata reference files, which could also be 
> modified if you choose) iii) the language of the metadata, and iv) the 
> language(s) of the documents themselves. All of these can easily be handled 
> for a single language applying to a given collection, and it is also 
> straightforward to separate a collection of documents in several languages 
> into sub-collections (by cross collection searching or by partitioning the 
> indexes).
>  But right now, I understand, the metadata names in the search boxes will not 
> change to the language of a changed language preference (they will stay in 
> the language in which the collection was built). However, the classifier 
> names will change if you have translated them with the GLI translation 
> facility. I also understand that the former situation will be improved in the 
> next version (v2.82).
>  There is a bug in v2.81 with exploding CDS/ISIS databases, and there is a 
> rather complicated procedure to get around this that I could provide. Else 
> this works find with 2.80 and will be fixed in next release (probably already 
> in the nightly snapshot releases if you want to use this). Probably it is the 
> same thing with BibTex, for which v2.80 should also be fine.
>  Chinese is special in that they do not separate words. v2.80 separates the 
> characters internally so that text searches are possible. v2.81 extends this 
> to searches of metadata content.  I'm not surprised that there were problems 
> with v2.73. Please not that this segmentation problem is special for Chinese. 
> Other languages with non-Latin character sets (Arabic, Tamil, etc.) have 
> worked fine before because the words are separated by spaces.
>                                  Bonne continuation, very interesting, 
> waiting for further experiments, John
>  At 20:39 18/02/2009, you wrote:
>  Hi John (and all),
>  Right now I got a small prototype with the languages listed below, mainly 
> from
>  portuguese countries.
>  I am at the first step,  checking how far can we go  with the languages,
>  and trying to discover if we got a frontier. At least for now, the only
>  problem is listing utf8 languages with a different alphabet like chinese.
>  The idea is having documents and interfaces on several languages,
>  so if one knows only kaigang, this person would be able  to access the 
> system.
>  The next step would be translate the dublin core information for each item
>  so someone who speaks kaigang  knows that there is something  in kabuverdianu
>  about the subject he is searching.
>  I am using Greenstone 2.73 only because I wasn't able to explode some bibtex
>  data on the last version (and I was already used with it...). But other 
> versions
>  and applications are welcome. We can exchange experience too.
>  I am attaching a printscreen of title's list and the languages list. You can 
> see
>  that the chinese title is missing, but I am able to do a search
>  in chinese.(Since it's just a first prototype, please
>  forgive me for the simple interface).
>  Languages list:
>   Chechewa
>   Forro
>   Ganda
>   Guinea Bissau Creole
>   kabuverdianu
>   Kaigang
>   Kikongo
>   Mandarin
>   Oshiwambo
>  Regards,
>  nadia.
>  Content-Type: image/jpeg; name="titles.JPG"
>  Content-Disposition: attachment; filename="titles.JPG"
>  X-Attachment-Id: f_frcedd0w0
>  Content-Type: image/jpeg; name="languages.JPG"
>  Content-Disposition: attachment; filename="languages.JPG"
>  X-Attachment-Id: f_frcednok1
>  Content-Type: image/jpeg; name="search_chinese.JPG"
>  Content-Disposition: attachment; filename="search_chinese.JPG"
>  X-Attachment-Id: f_frceomw02
>                  John B. Rose
>                  1 Bis, Rue des Châtre-Sacs
>                  92310 Sèvres
>                  France
>                  Email: <john.rose1@xxxxxxx>
>                          (in case of bounce then send to < 
> johnrose@xxxxxxxxxxxxxxxxxx>) 
> -- 
> Claudia Wanderley
> tel. +55 19 91362441

John B. Rose
1 Bis, Rue des Châtre-Sacs
92310 Sèvres
Email: <john.rose1@xxxxxxx> 
(in case of bounce then send to <johnrose@xxxxxxxxxxxxxxxxxx>)     

Claudia Wanderley
tel. +55 19 91362441

[1] mailto:john.rose1@xxxxxxx

Other related posts: