[bksvol-discuss] Monica's OCR results with bilingual pages RE: Re: friends don't let friends date Jason

  • From: "Maria Kristic" <maria6289@xxxxxxxxxxxxx>
  • To: <bksvol-discuss@xxxxxxxxxxxxx>
  • Date: Wed, 25 Jun 2008 23:54:10 -0400

Hi Monica,

I'm not asking this to imply in any way that you're not doing something
right, but rather because I like to submit the best quality scans I can and
so would like to know whether I should be expecting this. I'm wondering,
when you've dealt with these pages where the intermixed English/Spanish
hasn't come out correctly on the page, and you've tried re-scanning in vain,
are you recognizing these pages with both English and Spanish modules
selected for simultaneous recognition by the OCR engine in question? Also,
if so, when you  say that the page came out a mess, can you elaborate? Were
there diacritical marks on some words which were clearly English but which
the OCR engine recognized as Spanish, or could you tell that some of the
Spanish text was still trying to be recognized as English by the way it came
out, or was there simply a lot of junk in the text rather than the correct
words? In these E. K. Barber books I'm working on, there are French,
Italian, and Portuguese words intermixed with English on quite a few pages.
When going through the words with Ranked Spelling, it's flagged all of the
foreign words as not being in the English dictionary. I've then gone back
and rescanned these pages, with all of the appropriate language modules
selected (I've done it using both FineReader 8 and ScanSoft 15--I'm using
Kurzweil 1000 v11), optimizing if need be, and the pages have come out
really well, with accents and all, in terms of both languages found on the
page (I've read them through with my Braille display). I can
speak/read/write Spanish, so I can skim the Italian and Portuguese and make
a judgment call as to whether it looks right, but I've still put in the
non-English parts, especially the French ones since I don't know that
language, in to FreeTranslation.com, and they seem to make sense in terms of
the context of the English, hence that also confirms to me that the pages
turned out well. I started becoming confident in the OCR's abilities and
decided not to check some of the pages right on the spot--I'm going to read
them through anyway--but your below message is making me think that maybe I
should? That's why I ask about you language settings and results. Thanks for
your help. BTW, something a bit positive has come of all this language
correcting; I'm seeing that a couple of French words are indeed slightly
similar to Spanish, and I'm learning a bit of its grammar as well thanks to
FreeTranslation, <smile>.

Thanks,
Maria
Skype: MariaKristic
AIM: MCKristic
Email/MSN: maria6289@xxxxxxxxxxxxx
Google Talk: Maria.Kristic@xxxxxxxxx
Yahoo Messenger: mariakristic@xxxxxxxxx

-----Original Message-----
From: bksvol-discuss-bounce@xxxxxxxxxxxxx
[mailto:bksvol-discuss-bounce@xxxxxxxxxxxxx] On Behalf Of Monica Willyard
Sent: Wednesday, June 25, 2008 6:23 PM
To: bksvol-discuss@xxxxxxxxxxxxx
Subject: [bksvol-discuss] Re: friends don't let friends date Jason

...

There are some pages where the OCR software gets mixed up by
English and Spanish on the same page and ends up making a mess. I have
had to ask my daughter to type in pages like this because no amount of
rescanning makes it better.

...

-- 
Monica Willyard
Visit my blog at http://www.scannersguild.com

 To unsubscribe from this list send a blank Email to
bksvol-discuss-request@xxxxxxxxxxxxx
put the word 'unsubscribe' by itself in the subject line.  To get a list of 
available commands, put the word 'help' by itself in the subject line.

Other related posts:

  • » [bksvol-discuss] Monica's OCR results with bilingual pages RE: Re: friends don't let friends date Jason