[bksvol-discuss] Re: My preferred complete scanning and proofreading process

From: Cindy Rosenthal <popularplace@xxxxxxxxx>
To: bksvol-discuss@xxxxxxxxxxxxx
Date: Sun, 12 Apr 2009 11:39:08 -0700 (PDT)
Mayrie, I do what you do,too,except that  I do the Italics and bold  as I 
proofread. The only other thing that I do is to cut the front pages, i.e, 
cover, title page and copyright page and paste them on a blank so they won't be 
cganged to Times New Roman. Then when I'finished changing the rest of the book 
I paste those pages back where they belong.

Cindy
WISH LIST (CALLED REQUESTED ADDITIONS TO THE BOOKSHARE COLLECTION)IS AVAILABLE 
AT  
http://www.friendsofbookshare.org/wish_list/wish_list.htm
www.lljfm.net/bookshare/home.htm

A LIST OF BOOKS CURRENTLY BEING SCANNED IS AVAILABLE AT 
http://www.friendsofbookshare.org/
www.lljfm.net/bookshare/home.htm


--- On Sun, 2/8/09, mayrierenae@xxxxxxxxx <mayrierenae@xxxxxxxxx> wrote:

> From: mayrierenae@xxxxxxxxx <mayrierenae@xxxxxxxxx>
> Subject: [bksvol-discuss] My preferred complete scanning and proofreading 
> process
> To: bksvol-discuss@xxxxxxxxxxxxx
> Date: Sunday, February 8, 2009, 12:29 PM
> Hi guys,
> 
>     Below is a post that I sent to this list
> some time last year when
> someone asked me how I prepare a book for inclusion in the
> Bookshare
> collection.  This is only one way of achieving as
> clean a copy of a book as
> I know how to create.  I'm positive that there are
> several ways of achieving
> the same results.  There always are, in my experience
> when computers are
> concerned. This is just what I do.
> 
>     I will say that since I wrote this post,
> I have altered one part of
> this process.  The part of the process that I have
> altered has to do with
> standardizing font.   When I standardize
> font, I make all of the font Times
> New Roman point size 12.  I am careful to preserve all
> bold and italics.  I
> also enlarge section or chapter titles to font size of
> 16.  That way they
> are easier to spot for folks using vision to read. 
> Preserving bold and
> italics also is helpful for sighted readers.
> 
>     The rest has not changed.  The
> below steps include what I do to scan
> and to proofread a book.
> 
>     I hope this helps some folks. 
> Remember, this is just my preferred
> method and similar results can probably be achieved by
> other methods.
> 
> See below.
> 
> Mayrie
> 
>   
>     Okay, I did everything that I do to a
> book to prepare it for
> submission to Bookshare today short of reading it and
> documented the time it
> took.  The total minus actually reading the book was
> four and a half hours.
> One hour of that was spent in recognition that I did while
> eating and doing
> laundry.  Probably not necessary to count that,
> laugh.  But I included it
> anyway. So, here you go.
> 
> The book had 292 pages including back cover, jacket flaps,
> preliminary
> pages, and, of course, the text of the book. I'll tell you
> in general what I
> did, and how long it took, then elaborate on the particular
> process. As has
> been said before, not everyone's process is the same, and
> there are probably
> at least three ways to achieve any given result.  This
> is what I did with
> this particular book, and my process might vary slightly
> from book to book,
> but here is what I did.
> 
> I was using Kurzweil 1000 and one of the find and replace
> parts can easily
> be done in Microsoft word.  In Kurzweil the paragraph
> mark is represented by
> \n.  In word, that character is represented in the
> find and replace dialogue
> by ^p.  That might help folks validating using Word
> instead of Kurzweil.
> 
> 1.
> Scan took 90 minutes
> I am using an opticBook 3600 scanner in single-page
> mode.  Scanner settings
> are as follows:
> Scan to images, automatic page orientation, gray-scale
> data, resolution at
> 300 DPI.
> Recognition settings were:
> Collumn identification disabled, one page recognized per
> scan, speckle
> removal disabled, Text quality is normal, partial collumns
> kept, suspicious
> regions kept, blank pages kept, recognition engine is
> FineReader 8.0,
> English will be recognized.
> Reading settings:
> Line endings will be ignored by the editor and tables will
> not be
> identified.
> I do not identify tables in straight fiction because junk
> sometimes scans as
> a table and is more of a pain to remove that way, more time
> consuming.  I
> have to know when I'll need table recognition so I can
> enable it.
> While scanning to images, I am always reading another book
> that I have run
> through this process to catch errors that ranked spelling
> didn't.
> 
> 2.
> Recognize images took 1 hour.
> I do this when off eating, or doing laundry, or sleeping,
> something that
> doesn't require my computer to be doing anything else.
> This time may vary a lot depending upon how hardy your
> computer is, or how
> lame mine is.
> 
> 3.
> Save the file under the name of the book. No time taken.
> 
> 4.
> Clean up preliminary pages and confirm accurate page count:
> 15 minutes
> Label: [From The Back Cover] [From The Front Flap] [From
> The Back Flap][This
> Page is blank.] if any blank pages exist. Read through all
> preliminary pages
> and correct all scannos.  
> Determine where the publisher thought page one should go
> and set an
> opperator defined page number there as page 1.
> Check that the last page in the book is numbered properly,
> telling you that
> you do not have any missing or duplicated pages. If the
> numbers don't match,
> either rescan and insert pages that you missed, or delete
> duplicated pages. 
> 
> 5.
> Remove headers, protect chapter headings, number and label
> any blank pages,
> get rid of end-of-line hyphens, and ensure that blank lines
> at the tops of
> pages will be preserved: 30 minutes.
> Protect all chapter headings by placing the page number
> followed by a blank
> line above the chapter heading.  
> Remove all headers.  Do this only after protecting
> chapter headings, as very
> often the absence of a running header is the only
> indication of where a
> poorly scanned chapter heading should go.
> Page down through the document numbering and labeling all
> blank pages, and
> looking at the first word on each page to be sure that it
> is a complete
> word, and reconnect hyphenated words on one page.
> On each page beginning with a lower case letter, insert a
> space before that
> initial lower case letter.  This will help later.
> 
> 6.
> Insert page numbers at the tops of all pages: 30 minutes.
> Delete all page numbers at the bottom of pages.  These
> don't always scan at
> all, so can't be counted upon to be there in the page
> numbering for daisy
> navigation, and especially in the html of the Bookshare
> final copy in the
> collection.
> Insert page numbers at the tops of all pages not already
> numbered above
> chapter headings followed by two carriage returns.
> Remove all extra blank lines by  using the find and
> replace dialogue as
> follows:
> In the "find box" insert \n\n\n\n\n\n (\n is the character
> string that will
> search for a carriage return.) 
> In the replace box type\n\n Do this with the replace box
> remaining the same,
> but with five, then four, then three carriage return
> symbols each successive
> time in the "find" box.  This will get rid of all
> instances of more than one
> blank line between any blocks of text, or between page
> numbers and chapter
> headings or text on a page.
>  
> 7.
> Remove any extra carriage returns inadvertently inserted by
> the OCR: 5
> minutes.
> This involves using the find and replace command 27 times.
> In the find box type " " (That is quotation mark followed
> by space followed
> by quotation mark."
> In the replace box type "\n"
> This will separate any paragraphs between speakers that
> might not have been
> separated by the OCR program. This does happen regularly.
> Now you are going to look for paragraph marks that
> shouldn't be there.
> You will do this with each letter of the alphabet in lower
> case.
> In the find box type\na (That is backslash followed
> immediately by the lower
> case letters n and a) 
> In the replace box type space a that is hit the space bar
> followed
> immediately by the lower case letter a 
> Replace all.
> Inserting a space at the tops of pages before each
> occurring lower case
> letter allows your carefully inserted blank lines between
> page numbers and
> text on the page to be preserved now.
> 
> 8.
> Run ranked spelling: This took 20 minutes with this book.
> I started out with a 99.28% accuracy rating.
> Correct all scannos as ranked spelling or the spell checker
> finds them.
> 
> 9.
> At this point I read the book and correct any errors that
> the spell checker
> or ranked spelling didn't find.  Hopefully I catch
> them all.
> 
> 10.
> Convert to rtf and close the file. No time taken.
> 
> 11.
> In Microsoft Word, Protect page numbers and page breaks,
> standardize fonts
> and margins, and convert em dashes to double hyphens: 5
> minutes. (This is a
> generous estimate of how much time taken).
> Open the file in microsoft word.
> Standardize font and justify margins
> Make sure if validating someone else's submission that
> there are no smart
> quotes in the document, making sure that all quotation
> marks are standard
> quotes.  Open book tends to produce inaccurate
> quotation marks in my
> experience.
> Protect page numbers and page breaks by using the find and
> replace dialogue
> as follows:
> In the find box type: ^m
> In the replace type: ^p^m^p
> Replace all.
> Convert em dashes to double hyphens by using the find and
> replace dialogue
> as follows:
> In the find box type: ^+
> In the Replace box type: -- (That is two hyphens or two
> dashes, depending
> upon what you call that key to the right of the zero on the
> number row.) 
> Save the file.
> NOW YOU'RE DONE!
> 
>  To unsubscribe from this list send a blank Email to
> bksvol-discuss-request@xxxxxxxxxxxxx
> put the word 'unsubscribe' by itself in the subject
> line.  To get a list of available commands, put the
> word 'help' by itself in the subject line.
> 
> 


      

 To unsubscribe from this list send a blank Email to
bksvol-discuss-request@xxxxxxxxxxxxx
put the word 'unsubscribe' by itself in the subject line.  To get a list of 
available commands, put the word 'help' by itself in the subject line.
[bksvol-discuss] Re: My preferred complete scanning and proofreading process

Other related posts: