[bksvol-discuss] Re: txt page breaks redux

  • From: Guido Corona <guidoc@xxxxxxxxxx>
  • To: bksvol-discuss@xxxxxxxxxxxxx
  • Date: Tue, 28 Dec 2004 19:03:39 -0600

By the way,  If I scanned at 300DPI instead of 400DPI,  the same 450 
paperback scanning job would take just over 45 minutes.
I do not use any document feeder.  I just scan 2 pages per scan in 
continuous scanning mode with a 5 seconds delay between each scan.
The same strategy should be possible with the mainstream ABBYY Fine Reader 
Professional 7.0 software used by our very own revered donna Smith,  the 
Divine ABBYY Scanning Mistress!
 
Guido

Guido Dante Corona
IBM Accessibility Center,  Austin Tx.
Research Division,
Phone:  512. 838. 9735.
Email: guidoc@xxxxxxxxxxx
Web:  http://www.ibm.com/able




Guido Corona/Austin/IBM@IBMUS 
Sent by: bksvol-discuss-bounce@xxxxxxxxxxxxx
12/28/2004 06:50 PM
Please respond to
bksvol-discuss


To
bksvol-discuss@xxxxxxxxxxxxx
cc

Subject
[bksvol-discuss] Re: txt page breaks redux







Cindy,  I am using a lowly EPSON 1660 with the Kurzweil 9.02 software.  As 
I scan at 400 DPI in grayscale,  the OCR engine would take a lot longer to 
recognize if I scanned and recognized pages simultaneously.  So I scan 
images only.  Then I submit the entire stack of images for recognition at 
the end  Recognition will take anything from 20 minutes to 2 or even three 
hours,  depnding on the length of the book and the degree of difficulty 
encountered in the reco process.  No skin off my back though,  I can do a 
lot of other things during that time:  my attention is not needed. 

As I recall,  books with errors amounting to or less than 0.7% are deemed 
excellent. 
Between 0.7% and 1.5% are 'good'.  Submission with more than 1.5% error 
rate are deemed fair. 

But the reviewer has the opportunity to exercise 
judgment and override the system evaluation, up or down.  If a book seems 
to have holes with frequent missing or corrupted words,  I nuke it,  no 
matter what the system evaluates, and add a comment for the administrator. 

If the book has a bunch of 'the' misspelled as 'die' I try to fix them one 
at a time,  unless I find that to be a thankless task when that turns out 
to be just one aspect of a much bigger problem. 
So Cindy,  I think you know my opinion by now. 

Guido 

Guido Dante Corona
IBM Accessibility Center,  Austin Tx.
Research Division,
Phone:  512. 838. 9735.
Email: guidoc@xxxxxxxxxxx
Web:  http://www.ibm.com/able



Cindy <popularplace@xxxxxxxxx> 
Sent by: bksvol-discuss-bounce@xxxxxxxxxxxxx 
12/28/2004 06:24 PM 

Please respond to
bksvol-discuss


To
bksvol-discuss@xxxxxxxxxxxxx 
cc

Subject
[bksvol-discuss] Re: txt page breaks redux








Guido,

You must have wonderful scanner!!!! There's no way I
can scan a book that quickly, since I have to scan a
page at a time, and can convert 8 - 12 pages at a
time. 

Anyway, since so many other people are happier
scanning, I'm leaving that to them -- but unless
everybody submits in rtf format, that still leaves the
problem of hard page breaks in txt books, which
apparently some of you and put in and others of us
cannot. My solution, as things stand now, is to
download a txt file, reject it, and re-submit it as an
rtf file -- which means someone else will then have to
validate it.

But you bring up something I've been wondering about:
Should a book that is spell-checked only, and garbage
removed, be approved as being in excellent condition,
or good? What about scanning errors that pass the
spell-check, like "be" for "he," the number one for
capital I, lie for the, etc. And words that are
missing completely from sentences. I know the
excellent rating allows for some errors, but how many
before it becomes Good instead of Excellent? I
recently worked in a book that had a lot of missing
words. I would suspect that the omissions wouldn't
have made much difference to the reader, and I suppose
that in the cases of the other examples I gave any
reader could make changes as he/she read, but I wonder
if it wouldn't be better for books that haven't been
read and corrected by the validator to have a Good
rating and leave the Excellent for books that have
been done more carefully.

Cindy
--- Guido Corona <guidoc@xxxxxxxxxx> wrote:

> I know this will sound so dreadfully heartless,  no
> Charitable Seasonal 
> spirit and all the rest.  But It takes a grand total
> of just 1 hour and 5 
> minutes to scan an entire 450 page paperback book, 
> page breaks, font 
> info, and all the rest.  Than it takes about 90
> minutes to do some basic 
> cleanup, and finally an average of a couple of hours
> to spellcheck it.
> 
> I really do not understand why we are even bothering
> to discuss salvage 
> operations for DOA submissions,  when the culling ax
> and a quick rescan is 
> the only merciful course of action for most of these
> runts.
> 
> Guido Dante Corona
> IBM Accessibility Center,  Austin Tx.
> Research Division,
> Phone:  512. 838. 9735.
> Email: guidoc@xxxxxxxxxxx
> Web:  http://www.ibm.com/able
> 
> 
> 
> 
> "Marissa Mika" <Marissa.M@xxxxxxxxxxxx> 
> Sent by: bksvol-discuss-bounce@xxxxxxxxxxxxx
> 12/28/2004 05:35 PM
> Please respond to
> bksvol-discuss
> 
> 
> To
> <bksvol-discuss@xxxxxxxxxxxxx>
> cc
> 
> Subject
> [bksvol-discuss] Re: txt page breaks redux
> 
> 
> 
> 
> 
> 
> Hi Cindy, 
> 
> We're still working on it. (Gotta love consensus,
> huh?) Look for a
> message from me by the end of the week. 
> 
> Did everyone have a good Christmas? 
> 
> Marissa 
> 
> -----Original Message-----
> From: bksvol-discuss-bounce@xxxxxxxxxxxxx
> [mailto:bksvol-discuss-bounce@xxxxxxxxxxxxx] On
> Behalf Of Cindy
> Sent: Wednesday, December 22, 2004 9:21 PM
> To: bksvol-discuss@xxxxxxxxxxxxx
> Subject: [bksvol-discuss] txt page breaks redux
> 
> Hi, Marissa,
> 
> Thanks for the new list.
> 
> Is there any word yet on what to do with txt files
> or
> if they will be accepted without hard breaks, with
> spaces and page numbers instead? That doesn't
> prevent
> the breaks Word puts in in the wrong places, but by
> adding line spaces or changing font the file can
> probably be made to coincide with the book.
> 
> When I finish Johnny Tremain I'm thinking of fixing
> one of those troublesome romances, since I found a
> copy. As things stand now, I think  the best thing
> for
> me to do is to reject the txt file and submit a new
> rtf file with page breaks.
> 
> Cindy
> 
> 
> 
> 
> 
> __________________________________ 
> Do you Yahoo!? 
> Take Yahoo! Mail with you! Get it on your mobile
> phone. 
> http://mobile.yahoo.com/maildemo 
> 
> 
> 
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


Other related posts: