Mayrie, I do what you do,too,except that I do the Italics and bold as I proofread. The only other thing that I do is to cut the front pages, i.e, cover, title page and copyright page and paste them on a blank so they won't be cganged to Times New Roman. Then when I'finished changing the rest of the book I paste those pages back where they belong. Cindy WISH LIST (CALLED REQUESTED ADDITIONS TO THE BOOKSHARE COLLECTION)IS AVAILABLE AT http://www.friendsofbookshare.org/wish_list/wish_list.htm www.lljfm.net/bookshare/home.htm A LIST OF BOOKS CURRENTLY BEING SCANNED IS AVAILABLE AT http://www.friendsofbookshare.org/ www.lljfm.net/bookshare/home.htm --- On Sun, 2/8/09, mayrierenae@xxxxxxxxx <mayrierenae@xxxxxxxxx> wrote: > From: mayrierenae@xxxxxxxxx <mayrierenae@xxxxxxxxx> > Subject: [bksvol-discuss] My preferred complete scanning and proofreading > process > To: bksvol-discuss@xxxxxxxxxxxxx > Date: Sunday, February 8, 2009, 12:29 PM > Hi guys, > > Below is a post that I sent to this list > some time last year when > someone asked me how I prepare a book for inclusion in the > Bookshare > collection. This is only one way of achieving as > clean a copy of a book as > I know how to create. I'm positive that there are > several ways of achieving > the same results. There always are, in my experience > when computers are > concerned. This is just what I do. > > I will say that since I wrote this post, > I have altered one part of > this process. The part of the process that I have > altered has to do with > standardizing font. When I standardize > font, I make all of the font Times > New Roman point size 12. I am careful to preserve all > bold and italics. I > also enlarge section or chapter titles to font size of > 16. That way they > are easier to spot for folks using vision to read. > Preserving bold and > italics also is helpful for sighted readers. > > The rest has not changed. The > below steps include what I do to scan > and to proofread a book. > > I hope this helps some folks. > Remember, this is just my preferred > method and similar results can probably be achieved by > other methods. > > See below. > > Mayrie > > > Okay, I did everything that I do to a > book to prepare it for > submission to Bookshare today short of reading it and > documented the time it > took. The total minus actually reading the book was > four and a half hours. > One hour of that was spent in recognition that I did while > eating and doing > laundry. Probably not necessary to count that, > laugh. But I included it > anyway. So, here you go. > > The book had 292 pages including back cover, jacket flaps, > preliminary > pages, and, of course, the text of the book. I'll tell you > in general what I > did, and how long it took, then elaborate on the particular > process. As has > been said before, not everyone's process is the same, and > there are probably > at least three ways to achieve any given result. This > is what I did with > this particular book, and my process might vary slightly > from book to book, > but here is what I did. > > I was using Kurzweil 1000 and one of the find and replace > parts can easily > be done in Microsoft word. In Kurzweil the paragraph > mark is represented by > \n. In word, that character is represented in the > find and replace dialogue > by ^p. That might help folks validating using Word > instead of Kurzweil. > > 1. > Scan took 90 minutes > I am using an opticBook 3600 scanner in single-page > mode. Scanner settings > are as follows: > Scan to images, automatic page orientation, gray-scale > data, resolution at > 300 DPI. > Recognition settings were: > Collumn identification disabled, one page recognized per > scan, speckle > removal disabled, Text quality is normal, partial collumns > kept, suspicious > regions kept, blank pages kept, recognition engine is > FineReader 8.0, > English will be recognized. > Reading settings: > Line endings will be ignored by the editor and tables will > not be > identified. > I do not identify tables in straight fiction because junk > sometimes scans as > a table and is more of a pain to remove that way, more time > consuming. I > have to know when I'll need table recognition so I can > enable it. > While scanning to images, I am always reading another book > that I have run > through this process to catch errors that ranked spelling > didn't. > > 2. > Recognize images took 1 hour. > I do this when off eating, or doing laundry, or sleeping, > something that > doesn't require my computer to be doing anything else. > This time may vary a lot depending upon how hardy your > computer is, or how > lame mine is. > > 3. > Save the file under the name of the book. No time taken. > > 4. > Clean up preliminary pages and confirm accurate page count: > 15 minutes > Label: [From The Back Cover] [From The Front Flap] [From > The Back Flap][This > Page is blank.] if any blank pages exist. Read through all > preliminary pages > and correct all scannos. > Determine where the publisher thought page one should go > and set an > opperator defined page number there as page 1. > Check that the last page in the book is numbered properly, > telling you that > you do not have any missing or duplicated pages. If the > numbers don't match, > either rescan and insert pages that you missed, or delete > duplicated pages. > > 5. > Remove headers, protect chapter headings, number and label > any blank pages, > get rid of end-of-line hyphens, and ensure that blank lines > at the tops of > pages will be preserved: 30 minutes. > Protect all chapter headings by placing the page number > followed by a blank > line above the chapter heading. > Remove all headers. Do this only after protecting > chapter headings, as very > often the absence of a running header is the only > indication of where a > poorly scanned chapter heading should go. > Page down through the document numbering and labeling all > blank pages, and > looking at the first word on each page to be sure that it > is a complete > word, and reconnect hyphenated words on one page. > On each page beginning with a lower case letter, insert a > space before that > initial lower case letter. This will help later. > > 6. > Insert page numbers at the tops of all pages: 30 minutes. > Delete all page numbers at the bottom of pages. These > don't always scan at > all, so can't be counted upon to be there in the page > numbering for daisy > navigation, and especially in the html of the Bookshare > final copy in the > collection. > Insert page numbers at the tops of all pages not already > numbered above > chapter headings followed by two carriage returns. > Remove all extra blank lines by using the find and > replace dialogue as > follows: > In the "find box" insert \n\n\n\n\n\n (\n is the character > string that will > search for a carriage return.) > In the replace box type\n\n Do this with the replace box > remaining the same, > but with five, then four, then three carriage return > symbols each successive > time in the "find" box. This will get rid of all > instances of more than one > blank line between any blocks of text, or between page > numbers and chapter > headings or text on a page. > > 7. > Remove any extra carriage returns inadvertently inserted by > the OCR: 5 > minutes. > This involves using the find and replace command 27 times. > In the find box type " " (That is quotation mark followed > by space followed > by quotation mark." > In the replace box type "\n" > This will separate any paragraphs between speakers that > might not have been > separated by the OCR program. This does happen regularly. > Now you are going to look for paragraph marks that > shouldn't be there. > You will do this with each letter of the alphabet in lower > case. > In the find box type\na (That is backslash followed > immediately by the lower > case letters n and a) > In the replace box type space a that is hit the space bar > followed > immediately by the lower case letter a > Replace all. > Inserting a space at the tops of pages before each > occurring lower case > letter allows your carefully inserted blank lines between > page numbers and > text on the page to be preserved now. > > 8. > Run ranked spelling: This took 20 minutes with this book. > I started out with a 99.28% accuracy rating. > Correct all scannos as ranked spelling or the spell checker > finds them. > > 9. > At this point I read the book and correct any errors that > the spell checker > or ranked spelling didn't find. Hopefully I catch > them all. > > 10. > Convert to rtf and close the file. No time taken. > > 11. > In Microsoft Word, Protect page numbers and page breaks, > standardize fonts > and margins, and convert em dashes to double hyphens: 5 > minutes. (This is a > generous estimate of how much time taken). > Open the file in microsoft word. > Standardize font and justify margins > Make sure if validating someone else's submission that > there are no smart > quotes in the document, making sure that all quotation > marks are standard > quotes. Open book tends to produce inaccurate > quotation marks in my > experience. > Protect page numbers and page breaks by using the find and > replace dialogue > as follows: > In the find box type: ^m > In the replace type: ^p^m^p > Replace all. > Convert em dashes to double hyphens by using the find and > replace dialogue > as follows: > In the find box type: ^+ > In the Replace box type: -- (That is two hyphens or two > dashes, depending > upon what you call that key to the right of the zero on the > number row.) > Save the file. > NOW YOU'RE DONE! > > To unsubscribe from this list send a blank Email to > bksvol-discuss-request@xxxxxxxxxxxxx > put the word 'unsubscribe' by itself in the subject > line. To get a list of available commands, put the > word 'help' by itself in the subject line. > > To unsubscribe from this list send a blank Email to bksvol-discuss-request@xxxxxxxxxxxxx put the word 'unsubscribe' by itself in the subject line. To get a list of available commands, put the word 'help' by itself in the subject line.