[gameprogrammer] AW: Re: recognize the correct language from a stream of data

  • From: Julian Kücklich <julian@xxxxxxxxxxxx>
  • To: <gameprogrammer@xxxxxxxxxxxxx>
  • Date: Fri, 16 Jul 2004 12:28:13 +0100

I think what you might need is a speech tagger (see e.g.
http://www-nlp.stanford.edu/links/statnlp.html).=20

I am not an expert in computational linguistics, but AFAIK languages can =
be
differentiated by their characteristic patterns of word length,
consonant:vowel proportions, consonant combinations (in English 'w' and =
'r'
can be combined, as in 'wreck', while this is not possible in German),
letter position (in English, 'w' can be at the end of a word, as in =
'slow',
but again this is not possible in German) etc. etc.=20

If enough of this data can be gathered in the time available, this kind =
of
pattern analysis should do the trick.

The 'most common words' method might be problematic, as similar =
languages
like German and English have many letter combinations in common (e.g. =
die,
boot, kind, hand, etc.).=20

               =20
                                                        julian raul =
k=FCcklich

                                                              60 iona =
villas
                                                         glasnevin, =
dublin 9
                                                         republic of =
ireland


                                                      +353  1 700 8289 =
(day)

                                                  +353  1 850 0924 =
(evening)
                                                   +353 85 707 6224 =
(mobile)

VENARI LAVARI LUDERE                               =
http://www.playability.de
RIDERE OCCEST VIVERE                        =
http://particlestream.motime.com

> -----Urspr=FCngliche Nachricht-----
> Von: gameprogrammer-bounce@xxxxxxxxxxxxx [mailto:gameprogrammer-
> bounce@xxxxxxxxxxxxx] Im Auftrag von Tri M. Dang
> Gesendet: Freitag, 16. Juli 2004 12:10
> An: gameprogrammer@xxxxxxxxxxxxx
> Betreff: [gameprogrammer] Re: recognize the correct language from a =
stream
> of data
>=20
> I am working on a project that taking data from a server that feed me =
a
> stream of data which could be any language (european lang, japanese,
> middle east) and attempt to display in the correct valid font for that
> language.
>=20
> The thing is, let say japanese doesn't use Roman character.  That make =
it
> hard to compare.
>=20
> Any idea?  Thanks.
> Alan Wolfe <atrix2@xxxxxxx> wrote:
> Oh man, what a task...
>=20
> where are you getting this data streamed in from?
>=20
> i'd think what you would want to do is find the most common words from =
the
> languages you want to check for (like the, of, on, in for english =
maybe,
> o,
> en, es, for spanish etc?) and just tally it up to see what language =
scores
> the highest.
>=20
> ----- Original Message -----
> From: "Tri M. Dang"
> To:
> Sent: Thursday, July 15, 2004 5:46 PM
> Subject: [gameprogrammer] recognize the correct language from a stream =
of
> data
>=20
>=20
> > Hi,
> >
> > Does anyone have any suggestion on how to recognize the correct =
language
> (national language) from an incomming stream of data? (could be any
> language
> English, Japanese, ...)
> >
> > Any suggestion or link is welcomed.
> >
> > TD.
> >
> >
> >
> > ---------------------
> > To unsubscribe go to http://gameprogrammer.com/mailinglist.html
> >
> >
>=20
>=20
>=20
> ---------------------
> To unsubscribe go to http://gameprogrammer.com/mailinglist.html
>=20
>=20
>=20
>=20
>=20
>=20
> ---------------------
> To unsubscribe go to http://gameprogrammer.com/mailinglist.html
>=20




---------------------
To unsubscribe go to http://gameprogrammer.com/mailinglist.html


Other related posts:

  • » [gameprogrammer] AW: Re: recognize the correct language from a stream of data