[linuxinindia] IndicComputingNotes: Links, comments from overseas....

  • From: Frederick Noronha <fred@xxxxxxxxxxxxxxx>
  • To: linuxinindia@xxxxxxxxxxxxx
  • Date: Fri, 20 Sep 2002 16:55:50 +0530 (IST)

-----------------------------------------------------------------------
20Sep2002##########IndicComputing Bytes#########################Issue02
-----------------------------------------------------------------------

PEOPLE INTERESTED IN THE FIELD: This is an impressive list of people
working/interested in the Indic Computing field. It is based on those who
attended (or could not make it) for the Sept 15-16 Indic-Computing Workshop
at Bangalore.

If you would like to get in touch with any of them, you can locate their
contact details via Tapan S. Parikh <tap2k@xxxxxxxxx>:

Dr. U B Pavanaja (Kannada Ganaka Parishad, Bangalore), Joseph Koshy
(Hewlett-Packard, Bangalore), Brij Sethi (H-P Bangalore), Sunil Abraham
(Mahiti, Bangalore) RVS Sastry (IISc, Bangalore) C.V. Srinatha Sastry (KGP,
Bangalore), Kalika Bali (Picopeta Simputers, Bangalore), N Anitha (IISc,
Bangalore), Abraham K Mathen (H-P, Bangalore), K Nagarajan (H-P, Bangalore),
Sayamindu Dasgupta (ILUG-Calcutta).

Also on the list are Manoj R Annadurai and Aboo Thanish (Chennai Kavigal),
Dr. Hema Murthy (IIT-Madras Chennai), Rajkumar S (Free Software Foundation,
Kerala), Arun M (FSF, Tiruvananthapuram), Prof Pat Hall (Open University,
London), G Karunakar (Netcore, Mumbai), Tapan Parikh (Mumbai), Venkatesh
Hariharan (IndLinux, Mumbai), G. Nagarjuna (TIFR/FSF-Mumbai), Prakash Advani
(Netcore, Mumbai), Raveesh Gupta (Microsoft, New Delhi), Ravi Kant and
Pankaj Kaushal (Sarai, New Delhi), Mita Radhakrishnan and Tapas Desrousseaux
(Aurovillle Language Lab, Pondicherry), Ashish Kotamkar (Mithi, Pune), Ravi
Pande (font designer, Pune), Vijay Pratap Singh Aditya (Ahmedabad), Ms Neepa
Shah (Gujarat Vidyapeeth, Ahmedabad), Dr Samir Kelekar (KonkaniNet,
Goa/Bangalore), Susan Uskudarli (Bangalore), Abhas Abhinav and Vikram Singh
(DeepRoot Linux, Bangalore), KSR Anjaneyulu (H-P, Bangalore), Durgesh Rao
(NCST, Mumbai), Narasimha Murthy, TB Dinesh, CS Ramalingam, Naveen and
Suzanne (H-P, Bangalore).

Other members who could not participate, but are interested in/working on
the subject are:

Bala Pillai (Tamil Net, Australia), Manoranjan Kumar Singh (NCST,
Bangalore), CV Radhakrishnan (River Valley Technologies, Kerala), Dr Srinath
Srinivasa (IIIT-B, Bangalore), Dr Vinay L Deshpande (Ncore Technologies,
Blore), Prof Swami Manohar (Picopeta Simputers, Blore), Dr Sri Ganesh and
Prof A G Ramakrishnan (H-P, Banglore), Abhijit Das (IISc-Bangalore),
Swayandipta Pal Chaudhuri (Perl Mongers, Calcutta), Vinay Chhajalani
(Webduniya, Indore) Suresh Babu (INAPP Thiruvananthapuram), Baiju M
(FSF-Tvm), Keyur Shroff (NCST Mumbai), Srinath Shanbag (NCST Mumbai), Dr.
Pushpak Bhattacharya (IIT-Bombay, Mumbai), Osama Manzar (4Cplus.net New
Delhi), Aman Grewal (CHiPS Raipur), M K Saravanan (Centre for Singapore
Internet Research), Frank Pohlmann, Mahesh Pai, Edward Cherlin, Owen Taylor,
Eric Mader, Gaspar Sinai (Yudit), Deborah W Anderson (Script Encoding
Initiative), Free Standards Group, Asmus Freytag and Joseph Becker and
Kenneth Whistler (Unicode), Prof Ken Kenniston (MIT), Supreet (Sarai).
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

INDIAN TONGUES, NOT AVAILABLE: Dulce Felix <dulce@xxxxxxxxxxxxxxxxxxxxx> of
http://www.cityradio.nu offers submissions to Japanese search engines,
Chinese search engines, German search engines, Hispanic search engines etc.

Felix says: "Please note that at this point we do not provide website
promotion services in any of the Indian languages." Chinese, Korean and
Japanese are among the Asian languages offered.
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

COMMENT FROM KOLKATA: In a discussion arising on the Linux-Bangalore
non-tech list <linux-bangalore-non-tech@xxxxxxxxxxxxxxx> P.K.Sharma
<pksharma@xxxxxxxxxxxxxxx> of Calcutta had a point to make.

Responding to a report on the recent Bangalore Indic-Computing meet, he
argued: "I find this info quite useful. In Calcutta we are working on
bringing Bengali into Linux. A member claims success in it too!..."
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

LINKS TO GETTEXT AND EMACS INTERNATIONALIZATION: Richard Stallman
<rms@xxxxxxx> founder of the Free Software Foundation (FSF), responded to a
query about who were the right persons to contact re. the
internationalization of GNU/Linux (specially to Indian languages). He wrote:
"The maintainer of GNU Gettext is haible@xxxxxxxx handa@xxxxxxxxx works on
internationalization of Emacs." Maybe we should be contacting such quarters
more regularly, to place our concerns in mind.
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

NOTES FROM AN AMAZING LINGUIST IN THE US: Edward Cherlin
<edward@xxxxxxxxxxxxxxxx> creates international, multilingual Web sites, and
is active in Internationalization standards and implementation. He's based
in Cupertino CA 95014

He offered some interesting comments: 

ON FONTS AND TOOLS: Responding to Dr Pavanaja's point that Pfaedit can
create only glyph sets and cannot make an Opentype font with embedded tables
for glyph substitution, glyph positioning, distance, etc, Cherlin argues:
"Right. However, it is open source, so adding the ability to write Opentype
tables should be straightforward. See also GOTE (GNU OpenType Editor,
currently described as "rather alpha"."

"There are other commercial font editors. Fontlab 4.0 from Tiro Typeworks
can create Opentype fonts, and in a future version will be able to handle
non-BMP character codes," he says.

"Graphite's (a good toolkit for rendering) developers are trying to revive
it, perhaps in combination with Pango, which has joined Li18nux, which has
joined the Free Standards Group."

WITH AN OPENTYPE FONT AND RENDERING MECHANISM, WRITING A KEYBOARD DRIVER IS
QUITE EASY: Says Dr Cherlin: "Right. For Unix, it is a matter of looking up
the correct codes to enter into a text file. Mac is more work, and Windows
requires membership in MSDN to handle keyboard layouts completely.
Tavultesoft Keyman is a free program to create keyboard layouts, but it
operates at a different level from Microsoft's own keyboards."

SORTING TEXT: "Text to be sorted must go through several steps before
strings can be compared. UTR#10 discusses preprocessing, normalization,
array formation, and forming sort keys. There is also consideration of
'override mechanisms (tailoring) for creating language-specific
orderings.'," says Cherlin.

Dr Cherlin has written a market research study, "Non-Latin Font Technology
and Markets" (1990), and in 1994, wrote and published a study, "The
Worldwide Impact of the Unicode Character Set Standard". He is in the
process of taking over maintenance of the Unicode HOWTO for Linux from Bruno
Haible.

Some of the languages he has learnt in life include Hebrew at the synagogue
starting at age eight, a year of Latin in eighth grade, French and Russian
in High School, Swahili and a little Chinese in an after-school club, more
French and Russian in college, Korean in the Peace Corps, Japanese in Japan,
a little Pali and Sanskrit in his Buddhist training, Chinese at Durham
University in the UK, APL from my father, Tolkien's Dwarvish and Elvish,
Classical Greek (Euclid), Yiddish, Spanish, German, and a little Italian and
Portuguese on his own, the invented language Loglan on his won, the invented
language Lojban with the Logical Language Group, Various Slavic languages
plus Georgian and Armenian with the Slavyanka Russian Chorus, Tabla bols in
both Devanagari and Arabic script. Amazing! He is currently helping Tex
Texin on his Compelling Unicode Demo with Yiddish, Cherokee, Azeri, and
Burmese examples.

Says he: "If I had time, I would look at Farsi next, particularly the
astronomical and mathematical works of Omar Khayyam, and of course his
poetry, too. But for now I am sticking with writing systems rather than
languages. I am creating a Unicode APL font, and prodding people to do the
necessary Indic and South Asian Opentype fonts and rendering so that
everyone else can get on with the real work." He's available for consulting
contracts, or even a full-time job.
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

MORE FROM CHERLIN: This is the factual position, as he desscribes it --
"India has 18 official languages written in 10 different alphabets:
Devanagari (used for Hindi, Marathi, and others), Bengali, Gurmukhi
(Punjabi), Gujarati, Oriya, Malayalam, Kannada, Tamil, Telugu, and Latin
(English). In addition, more than 800 other languages spoken in India do not
have official status. Mandrake Linux, one popular distribution, includes
keyboards and fonts for Bengali, Devanagari, Gujarati, Gurmukhi, and Tamil,
five of the nine Indic writing systems. Unfortunately, many applications do
not accept these characters, and those that accept them may not handle them
correctly."
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

WHAT'S THE REAL DIFFICULTY?: Edward Cherlin of Web for Humans, an
international Web development company based in Cupertino, California, says,
"The problems of rendering each standard Indic script are reasonably well
understood, and will be solved soon in Pango. The real difficulty is with
languages that have never been written, or are written in non-standard
variants of the official scripts. The only organization I know of that has
been working seriously on this problem is Summer Institute of Linguistics
(SIL), and their work is stalled for lack of funds." Cherlin is active in
Unicode, L18nux, Pango, Free Standards Group, and other organizations
working on Indic and other unsupported writing systems, especially on the
problem of getting all of the interested parties into contact with each
other.
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

NOTE, SOMETHING ABOUT PANGO AND LI18NUX: Owen Taylor <otaylor@xxxxxxxxxx> is
the founder of the Pango project. Li18nux is working on standards for
keyboard input, among other things, in conjunction with the Linux Standard
Base of the Free Standards Group. They have focused first on Input Methods
for Chinese, Japanese, and Korean, but when Pango's Indic support is
complete they will extend their standard to include it.

At the toolkit level, Gtk and Qt are the most used toolkits. This helps. Gtk
already has a good framework through Pango project, and basic level support
for Indian languages. Qt also now has Unicode level support for all
languages, but rendering is not yet ready. However, Pango is independent of
Gtk, and can be used with Qt or any other software.

GNU, Li18nux and Pango are focusing on Opentype, which is the only 
font format that provides the glyph mapping tables needed to support 
Indic conjuncts. GOTE, the GNU OpenType Editor, will be the essential 
tool for this effort when it is completed.
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

WinXP AND APPLE: Cherlin explains other issues too. As he puts it: "Some
argue today that only Microsoft's WinXP has any kind of Indian language
support worth speaking about, even though Apple has provided Indian language
kits for many years.

"There is confusion about Unicode support for Indic writing systems, since
Unicode does not provide character codes for conjunct glyphs. Many in India
still think that this is a design flaw in Unicode, whereas the Unicode
designers argue that it is a necessary design decision so that we can escape
from the current broken Indic rendering techniques.

"The set of conjuncts is needed is not determined solely by the writing
system and language. It is font-specific, and can therefore only be
supported by font glyphs, not character encoding.  Unfortunately, PostScript
and TrueType fonts do not support the correct mapping tables, and the
problem can only be solved with Opentype fonts.

"In contrast, rendering Indic scripts using PostScript or TrueType fonts
requires encoding the conjuncts directly in the text stream, rather than the
letters composing them, and requires non-standard software to translate
between the sequence of letters from the keyboard and the sequence of
conjunct characters in a non-standard font. The result is text that cannot
be sorted and searched properly, where spelling and grammar checkers cannot
operate. It is hard on users to have to wait so long for proper support of
Indic scripts through Unicode, but the results are guaranteed to justify the
delay."
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

TEX USERS' CONTRIBUTION: Indian TeX Users Group have a project now to fund
font designers in all the Indian languages who are ready to write fonts and
donate under GPL to TUGIndia. They've thus secured 'Keli' a Malayalam font
family in various weights and shapes written by Hashim and released under
GPL. "We do hope to get more fonts in other languages to fill up the gaps.
We hope to use the savings generated with TUG2002 (to be held in India in
September 2002) exclusively for this purpose," says Radhakrishnan in
Thiruvananthapuram. Maybe these friends should get in touch with Pango and
Li18nux.
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

UNITYPE GLOBAL OFFICE, AN ADD-ON TO MS-OFFICE: Cherlin suggests, for those
who can afford it, Unitype Global Office, an add-on to Microsoft Office
which supports Hindi, Marathi, Nepali, Sanskrit, Punjabi, Gujarati, Bengali,
Assamese, Tamil, Telugu, Maldivian, Kannada, Malayalam, Urdu, Pashto, Dari,
and many other languages. See http://www.unitype.com/globaloffice.htm.
Although it uses non-Unicode encoded fonts and a non-standard rendering
engine, Global Office and Microsoft Office together are capable of writing
Unicode files that can be viewed correctly with Opentype Indic fonts when
they become available.
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

HARDLY ANY GPL-ed: SIL's Fonts in Cyberspace pages at
http://www.sil.org/computing/fonts/ and Alan Wood's Unicode Resources at
http://www.hclrss.demon.co.uk/unicode/fontsbyrange.html both list fonts for
every major writing system, but hardly any are GPL-ed. This is about to
change, according to Cherlin. "Several projects and numerous individuals are
working on Free Unicode fonts, now that commercial Opentype font editors
such as Tiro Typeworks Fontlab 4.0 are available. Finishing the GNU OpenType
Editor (GOTE) will speed things up much more."
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

SOME INTERESTING LINKS: 
Pango http://www.pango.org
Graphite http://www.sil.org/computing/graphite/
Li18nux http://www.li18nux.org
Free Standards Group http://www.freestandards.org/
Mandrake http://www.mandrake.com
-----------------------------------------------------------------------
Compiled in public interest from material on the Net by:
-----------------------------------------------------------------------
Frederick Noronha * Freelance Journalist * Goa * India 832.409490 / 409783
BYTESFORALL www.bytesforall.org  * GNU-LINUX http://linuxinindia.pitas.com
Email fred@xxxxxxxxxxxxxxx * Mobile +9822 122436 (Goa) * Saligao Goa India
Writing with a difference... on what makes *the* difference




Other related posts:

  • » [linuxinindia] IndicComputingNotes: Links, comments from overseas....