[bksvol-discuss] Re: txt page breaks redux

  • From: "Louise" <lougou@xxxxxxxxxxxxxx>
  • To: <bksvol-discuss@xxxxxxxxxxxxx>
  • Date: Wed, 29 Dec 2004 09:55:37 -0600

Guido, After you scan a book to image and are ready to recognize, do you just 
go to the images folder, select the first image and then do a select all or 
shift Control End to select all of the images?  I've never done it this way 
before, but it's worth a try.




  ----- Original Message ----- 
  From: Guido Corona 
  To: bksvol-discuss@xxxxxxxxxxxxx 
  Sent: Tuesday, December 28, 2004 7:03 PM
  Subject: [bksvol-discuss] Re: txt page breaks redux



  By the way,  If I scanned at 300DPI instead of 400DPI,  the same 450 
paperback scanning job would take just over 45 minutes. 
  I do not use any document feeder.  I just scan 2 pages per scan in continuous 
scanning mode with a 5 seconds delay between each scan. 
  The same strategy should be possible with the mainstream ABBYY Fine Reader 
Professional 7.0 software used by our very own revered donna Smith,  the Divine 
ABBYY Scanning Mistress! 
    
  Guido 

  Guido Dante Corona
  IBM Accessibility Center,  Austin Tx.
  Research Division,
  Phone:  512. 838. 9735.
  Email: guidoc@xxxxxxxxxxx
  Web:  http://www.ibm.com/able



        Guido Corona/Austin/IBM@IBMUS 
        Sent by: bksvol-discuss-bounce@xxxxxxxxxxxxx 
        12/28/2004 06:50 PM Please respond to
              bksvol-discuss 


       To bksvol-discuss@xxxxxxxxxxxxx  
              cc  
              Subject [bksvol-discuss] Re: txt page breaks redux 

              

       




  Cindy,  I am using a lowly EPSON 1660 with the Kurzweil 9.02 software.  As I 
scan at 400 DPI in grayscale,  the OCR engine would take a lot longer to 
recognize if I scanned and recognized pages simultaneously.  So I scan images 
only.  Then I submit the entire stack of images for recognition at the end  
Recognition will take anything from 20 minutes to 2 or even three hours,  
depnding on the length of the book and the degree of difficulty encountered in 
the reco process.  No skin off my back though,  I can do a lot of other things 
during that time:  my attention is not needed. 

  As I recall,  books with errors amounting to or less than 0.7% are deemed 
excellent. 
  Between 0.7% and 1.5% are 'good'.  Submission with more than 1.5% error rate 
are deemed fair. 

  But the reviewer has the opportunity to exercise 
  judgment and override the system evaluation, up or down.  If a book seems to 
have holes with frequent missing or corrupted words,  I nuke it,  no matter 
what the system evaluates, and add a comment for the administrator. 
  If the book has a bunch of 'the' misspelled as 'die' I try to fix them one at 
a time,  unless I find that to be a thankless task when that turns out to be 
just one aspect of a much bigger problem. 
  So Cindy,  I think you know my opinion by now. 

  Guido 

  Guido Dante Corona
  IBM Accessibility Center,  Austin Tx.
  Research Division,
  Phone:  512. 838. 9735.
  Email: guidoc@xxxxxxxxxxx
  Web:  http://www.ibm.com/able


        Cindy <popularplace@xxxxxxxxx> 
        Sent by: bksvol-discuss-bounce@xxxxxxxxxxxxx 
        12/28/2004 06:24 PM 
              Please respond to
              bksvol-discuss 


       To bksvol-discuss@xxxxxxxxxxxxx  
              cc  
              Subject [bksvol-discuss] Re: txt page breaks redux 


              

       




  Guido,

  You must have wonderful scanner!!!! There's no way I
  can scan a book that quickly, since I have to scan a
  page at a time, and can convert 8 - 12 pages at a
  time. 

  Anyway, since so many other people are happier
  scanning, I'm leaving that to them -- but unless
  everybody submits in rtf format, that still leaves the
  problem of hard page breaks in txt books, which
  apparently some of you and put in and others of us
  cannot. My solution, as things stand now, is to
  download a txt file, reject it, and re-submit it as an
  rtf file -- which means someone else will then have to
  validate it.

  But you bring up something I've been wondering about:
  Should a book that is spell-checked only, and garbage
  removed, be approved as being in excellent condition,
  or good? What about scanning errors that pass the
  spell-check, like "be" for "he," the number one for
  capital I, lie for the, etc. And words that are
  missing completely from sentences. I know the
  excellent rating allows for some errors, but how many
  before it becomes Good instead of Excellent? I
  recently worked in a book that had a lot of missing
  words. I would suspect that the omissions wouldn't
  have made much difference to the reader, and I suppose
  that in the cases of the other examples I gave any
  reader could make changes as he/she read, but I wonder
  if it wouldn't be better for books that haven't been
  read and corrected by the validator to have a Good
  rating and leave the Excellent for books that have
  been done more carefully.

  Cindy
  --- Guido Corona <guidoc@xxxxxxxxxx> wrote:

  > I know this will sound so dreadfully heartless,  no
  > Charitable Seasonal 
  > spirit and all the rest.  But It takes a grand total
  > of just 1 hour and 5 
  > minutes to scan an entire 450 page paperback book, 
  > page breaks, font 
  > info, and all the rest.  Than it takes about 90
  > minutes to do some basic 
  > cleanup, and finally an average of a couple of hours
  > to spellcheck it.
  > 
  > I really do not understand why we are even bothering
  > to discuss salvage 
  > operations for DOA submissions,  when the culling ax
  > and a quick rescan is 
  > the only merciful course of action for most of these
  > runts.
  > 
  > Guido Dante Corona
  > IBM Accessibility Center,  Austin Tx.
  > Research Division,
  > Phone:  512. 838. 9735.
  > Email: guidoc@xxxxxxxxxxx
  > Web:  http://www.ibm.com/able
  > 
  > 
  > 
  > 
  > "Marissa Mika" <Marissa.M@xxxxxxxxxxxx> 
  > Sent by: bksvol-discuss-bounce@xxxxxxxxxxxxx
  > 12/28/2004 05:35 PM
  > Please respond to
  > bksvol-discuss
  > 
  > 
  > To
  > <bksvol-discuss@xxxxxxxxxxxxx>
  > cc
  > 
  > Subject
  > [bksvol-discuss] Re: txt page breaks redux
  > 
  > 
  > 
  > 
  > 
  > 
  > Hi Cindy, 
  > 
  > We're still working on it. (Gotta love consensus,
  > huh?) Look for a
  > message from me by the end of the week. 
  > 
  > Did everyone have a good Christmas? 
  > 
  > Marissa 
  > 
  > -----Original Message-----
  > From: bksvol-discuss-bounce@xxxxxxxxxxxxx
  > [mailto:bksvol-discuss-bounce@xxxxxxxxxxxxx] On
  > Behalf Of Cindy
  > Sent: Wednesday, December 22, 2004 9:21 PM
  > To: bksvol-discuss@xxxxxxxxxxxxx
  > Subject: [bksvol-discuss] txt page breaks redux
  > 
  > Hi, Marissa,
  > 
  > Thanks for the new list.
  > 
  > Is there any word yet on what to do with txt files
  > or
  > if they will be accepted without hard breaks, with
  > spaces and page numbers instead? That doesn't
  > prevent
  > the breaks Word puts in in the wrong places, but by
  > adding line spaces or changing font the file can
  > probably be made to coincide with the book.
  > 
  > When I finish Johnny Tremain I'm thinking of fixing
  > one of those troublesome romances, since I found a
  > copy. As things stand now, I think  the best thing
  > for
  > me to do is to reject the txt file and submit a new
  > rtf file with page breaks.
  > 
  > Cindy
  > 
  > 
  > 
  > 
  >  
  > __________________________________ 
  > Do you Yahoo!? 
  > Take Yahoo! Mail with you! Get it on your mobile
  > phone. 
  > http://mobile.yahoo.com/maildemo 
  > 
  > 
  > 
  > 


  __________________________________________________
  Do You Yahoo!?
  Tired of spam?  Yahoo! Mail has the best spam protection around 
  http://mail.yahoo.com 



Other related posts: