[bksvol-discuss] Re: txt page breaks redux

  • From: Cindy <popularplace@xxxxxxxxx>
  • To: bksvol-discuss@xxxxxxxxxxxxx
  • Date: Tue, 28 Dec 2004 19:40:17 -0800 (PST)

Guido,

That's very interesting. I, too, have the 1660, but of
course I don't have Kurzweil. I think it's Fine Reader
that I ascertained it came with, but I'm not sure.

If I used Image instead of OCR, how would I convert
the images to OCR, or can I not without Kurzweill? I
thought I tried once when I scanned with Image on by
mistake, but I may be wrong.

The computer and scanner are for me like a car -- I
can drive it but I understand little about how it
works, except for all the information you and others
here share. (shamefaced smile)

Cindy

--- Guido Corona <guidoc@xxxxxxxxxx> wrote:

> Cindy,  I am using a lowly EPSON 1660 with the
> Kurzweil 9.02 software.  As 
> I scan at 400 DPI in grayscale,  the OCR engine
> would take a lot longer to 
> recognize if I scanned and recognized pages
> simultaneously.  So I scan 
> images only.  Then I submit the entire stack of
> images for recognition at 
> the end  Recognition will take anything from 20
> minutes to 2 or even three 
> hours,  depnding on the length of the book and the
> degree of difficulty 
> encountered in the reco process.  No skin off my
> back though,  I can do a 
> lot of other things during that time:  my attention
> is not needed.
> 
> As I recall,  books with errors amounting to or less
> than 0.7% are deemed 
> excellent.
> Between 0.7% and 1.5% are 'good'.  Submission with
> more than 1.5% error 
> rate are deemed fair.
> 
> But the reviewer has the opportunity to exercise 
> judgment and override the system evaluation, up or
> down.  If a book seems 
> to have holes with frequent missing or corrupted
> words,  I nuke it,  no 
> matter what the system evaluates, and add a comment
> for the administrator.
> If the book has a bunch of 'the' misspelled as 'die'
> I try to fix them one 
> at a time,  unless I find that to be a thankless
> task when that turns out 
> to be just one aspect of a much bigger problem.
> So Cindy,  I think you know my opinion by now.
> 
> Guido
> 
> Guido Dante Corona
> IBM Accessibility Center,  Austin Tx.
> Research Division,
> Phone:  512. 838. 9735.
> Email: guidoc@xxxxxxxxxxx
> Web:  http://www.ibm.com/able
> 
> 
> 
> 
> Cindy <popularplace@xxxxxxxxx> 
> Sent by: bksvol-discuss-bounce@xxxxxxxxxxxxx
> 12/28/2004 06:24 PM
> Please respond to
> bksvol-discuss
> 
> 
> To
> bksvol-discuss@xxxxxxxxxxxxx
> cc
> 
> Subject
> [bksvol-discuss] Re: txt page breaks redux
> 
> 
> 
> 
> 
> 
> Guido,
> 
> You must have wonderful scanner!!!! There's no way I
> can scan a book that quickly, since I have to scan a
> page at a time, and can convert 8 - 12 pages at a
> time. 
> 
> Anyway, since so many other people are happier
> scanning, I'm leaving that to them -- but unless
> everybody submits in rtf format, that still leaves
> the
> problem of hard page breaks in txt books, which
> apparently some of you and put in and others of us
> cannot. My solution, as things stand now, is to
> download a txt file, reject it, and re-submit it as
> an
> rtf file -- which means someone else will then have
> to
> validate it.
> 
> But you bring up something I've been wondering
> about:
> Should a book that is spell-checked only, and
> garbage
> removed, be approved as being in excellent
> condition,
> or good? What about scanning errors that pass the
> spell-check, like "be" for "he," the number one for
> capital I, lie for the, etc. And words that are
> missing completely from sentences. I know the
> excellent rating allows for some errors, but how
> many
> before it becomes Good instead of Excellent? I
> recently worked in a book that had a lot of missing
> words. I would suspect that the omissions wouldn't
> have made much difference to the reader, and I
> suppose
> that in the cases of the other examples I gave any
> reader could make changes as he/she read, but I
> wonder
> if it wouldn't be better for books that haven't been
> read and corrected by the validator to have a Good
> rating and leave the Excellent for books that have
> been done more carefully.
> 
> Cindy
> --- Guido Corona <guidoc@xxxxxxxxxx> wrote:
> 
> > I know this will sound so dreadfully heartless, 
> no
> > Charitable Seasonal 
> > spirit and all the rest.  But It takes a grand
> total
> > of just 1 hour and 5 
> > minutes to scan an entire 450 page paperback book,
> 
> > page breaks, font 
> > info, and all the rest.  Than it takes about 90
> > minutes to do some basic 
> > cleanup, and finally an average of a couple of
> hours
> > to spellcheck it.
> > 
> > I really do not understand why we are even
> bothering
> > to discuss salvage 
> > operations for DOA submissions,  when the culling
> ax
> > and a quick rescan is 
> > the only merciful course of action for most of
> these
> > runts.
> > 
> > Guido Dante Corona
> > IBM Accessibility Center,  Austin Tx.
> > Research Division,
> > Phone:  512. 838. 9735.
> > Email: guidoc@xxxxxxxxxxx
> > Web:  http://www.ibm.com/able
> > 
> > 
> > 
> > 
> > "Marissa Mika" <Marissa.M@xxxxxxxxxxxx> 
> > Sent by: bksvol-discuss-bounce@xxxxxxxxxxxxx
> > 12/28/2004 05:35 PM
> > Please respond to
> > bksvol-discuss
> > 
> > 
> > To
> > <bksvol-discuss@xxxxxxxxxxxxx>
> > cc
> > 
> > Subject
> > [bksvol-discuss] Re: txt page breaks redux
> > 
> > 
> > 
> > 
> > 
> > 
> > Hi Cindy, 
> > 
> > We're still working on it. (Gotta love consensus,
> > huh?) Look for a
> > message from me by the end of the week. 
> > 
> > Did everyone have a good Christmas? 
> > 
> > Marissa 
> > 
> > -----Original Message-----
> > From: bksvol-discuss-bounce@xxxxxxxxxxxxx
> > [mailto:bksvol-discuss-bounce@xxxxxxxxxxxxx] On
> > Behalf Of Cindy
> > Sent: Wednesday, December 22, 2004 9:21 PM
> > To: bksvol-discuss@xxxxxxxxxxxxx
> > Subject: [bksvol-discuss] txt page breaks redux
> > 
> > Hi, Marissa,
> > 
> > Thanks for the new list.
> > 
> > Is there any word yet on what to do with txt files
> > or
> > if they will be accepted without hard breaks, with
> > spaces and page numbers instead? That doesn't
> > prevent
> > the breaks Word puts in in the wrong places, but
> by
> > adding line spaces or changing font the file can
> > probably be made to coincide with the book.
> > 
> > When I finish Johnny Tremain I'm thinking of
> fixing
> > one of those troublesome romances, since I found a
> > copy. As things stand now, I think  the best thing
> > for
> > me to do is to reject the txt file and submit a
> new
> > rtf file with page breaks.
> > 
> > Cindy
> > 
> > 
> > 
> > 
> > 
> > __________________________________ 
> > Do you Yahoo!? 
> > Take Yahoo! Mail with you! Get it on your mobile
> > phone. 
> > http://mobile.yahoo.com/maildemo 
> > 
> > 
> > 
> > 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam
> protection around 
> http://mail.yahoo.com 
> 
> 
> 



                
__________________________________ 
Do you Yahoo!? 
Send a seasonal email greeting and help others. Do good. 
http://celebrity.mail.yahoo.com

Other related posts: