[greenstone_pt] Re: About a multilingual prototype

  • From: Claudia Wanderley <cmwanderley@xxxxxxxxx>
  • To: greenstone_pt@xxxxxxxxxxxxx
  • Date: Thu, 19 Feb 2009 09:23:39 -0300

Dear John,
our subject is multilingualism. You are right. And we are in countries that
speak Portuguese. In fact, we have reached for you to comprehend
multilingualism possibilities in Greenstone.

There are strong local languages in Portuguese speaking countries, called
national languages, minor languages, with considerable amount of cultural
production. Also, it is important to say, these languages lives don`t fit in
western model of "language and literature", they`re not a closed system. A
same person in Angola, for instance, can speak four languages in a day, for
different activities at home, in the office, with the familiy, on the
market... So the idea of a digital library that could work metaphorically in
such linguistic practice would do a better job to include our local
languages production in digital world. Beacuse it is necessary to able to
shift from one language to another, as the speaker does.

We could be both lists? Because we are both of them. We`re debating for
Multilingualism, and we`re starting to do it in Portuguese speaking
countries. I just think it`s important not to "erase" multilingualism debate
from the portuguese list. Should we open another one? Do you have an ongoing
list on multilingualism?

And yes, it would be wonderful to have specialists for a general discussion
on multilingualism. Perfect.

Best,
Claudia

2009/2/19 John Rose <john.rose1@xxxxxxx>

>  Dear Claudia,
>
> I am a bit confused. I thought that the subject of this list was to discuss
> (in Portuguese) the evaluation/improvement, promotion and use of Greenstone
> in Portuguese speaking countries (including use of local languages in those
> countries) and to provide help to users with questions/problems.
>
> If we want to have a general discussion on mulitlingualism in digital
> libraries, then perhaps we should have another list for this, in which we
> would invite participants worldwide who are interested in this problem. I
> guess that in such a discussion the contributions would probably be in
> English to ensure maximum mutual understanding.
>
> Coming back to Chinese (but not sure why Nadia has been focusing on this,
> rather than for example on Arabic or Russian which like Chinese are UNESCO
> languages using non-Latin characters and with full operational Greenstone
> interfaces. I don't think that the problem of pinyin versus Chinese
> ideograms is so fundamentally different from correctly transliterating
> Arabic or Russian into Latin script (of course Chinese is more complicated
> since there is I believe not always a unique mapping between a pinyin
> phoneme, even with the tone indicated, and the corresponding Chinese
> ideogram, but some ambiguities exist in almost all transliteration schemes -
> as well as the problem that many scholarly works, especially older ones, use
> non-standard or alternative transliteration schemes). Greenstone has no
> special functionality to support double use of a language - in its native
> character form and in transliterated form. This could be interesting for
> linguistic scholars but the vast majority of speakers of a language would
> want to access information in their native character set, not through
> transliterated characters. It would technically be possible to provide a
> pinyin user interface and also to search on metadata and/or full text in
> pinyin or ideograms or even (I believe but not certain) mixed combinations,
> but I have not seen an example of this sort of specialized linguistic DL
> application.
>
> Greenstone is trying to provide, evaluate and maintain the largest number
> possible of language interfaces. Because of the immense amount of work
> involved, and the importance of having users take responsibility for
> deciding which languages to use, all of the language interface work is
> undertaken by volunteer translators.
>
> Hope this clarifies, perhaps it would be best to move the discussion on
> Chinese to individual correspondence if you want to proceed? Our Chinese
> specialist Anna Huang is receiving this message and could perhaps provide
> any further advice which she might have on this specific subject directly to
> you and Nadia. Best regards, John
>
>
> At 02:43 19/02/2009, you wrote:
>
> Dears,
> as a linguist, not understanding very well what you`re talking about, if we
> put the chinese data in - Nadia, I found the name - pinyin (the romanization
> of mandarin ), could it work?
> Meaning, is it possible to build the chinese data in both systems, pinyin
> and chinese ideograms, in a way that they are equivalent for this system? Is
> this GLI translation capable of inter/trans-characters translations, or
> better is there transliteration availability?
> Best,
> Claudia
>
> 2009/2/18 John Rose <john.rose1@xxxxxxx>
>  Dear Nadia,
>
> I thought we were supposed to be speaking in Portuguese on this list
> (except for me) (-:
>
> There are 4 different aspects to the language interface: i) the
> spreadsheets you have to translate the user interface, ii) translations of
> the metadata names (there is a facility in GLI for translation of terms
> which are not already included in the metadata reference files, which could
> also be modified if you choose) iii) the language of the metadata, and iv)
> the language(s) of the documents themselves. All of these can easily be
> handled for a single language applying to a given collection, and it is also
> straightforward to separate a collection of documents in several languages
> into sub-collections (by cross collection searching or by partitioning the
> indexes).
>
> But right now, I understand, the metadata names in the search boxes will
> not change to the language of a changed language preference (they will stay
> in the language in which the collection was built). However, the classifier
> names will change if you have translated them with the GLI translation
> facility. I also understand that the former situation will be improved in
> the next version (v2.82).
>
> There is a bug in v2.81 with exploding CDS/ISIS databases, and there is a
> rather complicated procedure to get around this that I could provide. Else
> this works find with 2.80 and will be fixed in next release (probably
> already in the nightly snapshot releases if you want to use this). Probably
> it is the same thing with BibTex, for which v2.80 should also be fine.
>
> Chinese is special in that they do not separate words. v2.80 separates the
> characters internally so that text searches are possible. v2.81 extends this
> to searches of metadata content.  I'm not surprised that there were problems
> with v2.73. Please not that this segmentation problem is special for
> Chinese. Other languages with non-Latin character sets (Arabic, Tamil, etc.)
> have worked fine before because the words are separated by spaces.
>
>                                 Bonne continuation, very interesting,
> waiting for further experiments, John
>
>
> At 20:39 18/02/2009, you wrote:
>  Hi John (and all),
>
> Right now I got a small prototype with the languages listed below, mainly
> from
> portuguese countries.
> I am at the first step,  checking how far can we go  with the languages,
> and trying to discover if we got a frontier. At least for now, the only
> problem is listing utf8 languages with a different alphabet like chinese.
> The idea is having documents and interfaces on several languages,
> so if one knows only kaigang, this person would be able  to access the
> system.
> The next step would be translate the dublin core information for each item
> so someone who speaks kaigang  knows that there is something  in
> kabuverdianu
> about the subject he is searching.
>
> I am using Greenstone 2.73 only because I wasn't able to explode some
> bibtex
> data on the last version (and I was already used with it...). But other
> versions
> and applications are welcome. We can exchange experience too.
>
> I am attaching a printscreen of title's list and the languages list. You
> can see
> that the chinese title is missing, but I am able to do a search
> in chinese.(Since it's just a first prototype, please
> forgive me for the simple interface).
>
> Languages list:
>  Chechewa
>  Forro
>  Ganda
>  Guinea Bissau Creole
>  kabuverdianu
>  Kaigang
>  Kikongo
>  Mandarin
>  Oshiwambo
>
>
> Regards,
> nadia.
> Content-Type: image/jpeg; name="titles.JPG"
> Content-Disposition: attachment; filename="titles.JPG"
> X-Attachment-Id: f_frcedd0w0
>
>
> Content-Type: image/jpeg; name="languages.JPG"
> Content-Disposition: attachment; filename="languages.JPG"
> X-Attachment-Id: f_frcednok1
>
>
> Content-Type: image/jpeg; name="search_chinese.JPG"
> Content-Disposition: attachment; filename="search_chinese.JPG"
> X-Attachment-Id: f_frceomw02
>
>
>
>                 John B. Rose
>                 1 Bis, Rue des Châtre-Sacs
>                 92310 Sèvres
>                 France
>                 Email: <john.rose1@xxxxxxx>
>                         (in case of bounce then send to 
> <johnrose@xxxxxxxxxxxxxxxxxx>)
>
>
>
>
>
> --
> Claudia Wanderley
> tel. +55 19 91362441
>
>
>                  John B. Rose
>                  1 Bis, Rue des Châtre-Sacs
>                  92310 Sèvres
>                  France
>                  Email: <john.rose1@xxxxxxx>
>                           (in case of bounce then send to <
> johnrose@xxxxxxxxxxxxxxxxxx>)
>



-- 
Claudia Wanderley
tel. +55 19 91362441

Other related posts: