[bksvol-discuss] Re: My Complete Proofreading Process

  • From: "Mayrie ReNae" <mayrierenae@xxxxxxxxx>
  • To: <bksvol-discuss@xxxxxxxxxxxxx>
  • Date: Mon, 16 Mar 2009 10:27:41 -0700

Hi Alyssa,

        Opperator defined page numbers are only announced, not printed in
the text.  And getting rid of blank lines the way that I do does not get rid
of the kinds of things that you are talking about, just blank lines in
profusion.  I leave the white space, just reduce it to one blank line, not
say, six, if that is what the OCR thought it saw.

        However, word of warning.  If you want to preserve white space
between sections in a book that will be in the Bookshare collection, you'll
need to put this * * * on the blank line, or the stripper will get rid of
the blank line. The stripper gets rid of tabs, extra spaces, and blank lines
after it rescues the page numbers.  

Mayrie

 

-----Original Message-----
From: bksvol-discuss-bounce@xxxxxxxxxxxxx
[mailto:bksvol-discuss-bounce@xxxxxxxxxxxxx] On Behalf Of Alyssa
Sent: Monday, March 16, 2009 9:02 AM
To: bksvol-discuss@xxxxxxxxxxxxx
Subject: [bksvol-discuss] Re: My Complete Proofreading Process

I will be saving this list, but I do have two questions!

First of all, can't you give operator-defined page numbers throughout the
entire file so that you do not have to manually place them at the top of
each page?

Secondly, if you deleted some blank lines, wouldn't that take them away from
areas in the book that need them. To give you an example of what I am
referring to, in a book I am proofing right now, there are blank lines
between different settings within the same chapter. A character may be in
the park, and the next line, the book is referring to totally different
characters in another location. Does this make sense?


 
-Alyssa

-----Original Message-----
From: bksvol-discuss-bounce@xxxxxxxxxxxxx
[mailto:bksvol-discuss-bounce@xxxxxxxxxxxxx] On Behalf Of
mayrierenae@xxxxxxxxx
Sent: Sunday, March 15, 2009 6:48 PM
To: bksvol-discuss@xxxxxxxxxxxxx
Subject: [bksvol-discuss] My Complete Proofreading Process

MY COMPLETE PROOFREADING PROCESS

        Okay, I did everything that I do to a book to prepare it for
submission to Bookshare today short of reading it and documented the time it
took.  (Someone originally wanted to know how long I spend.) The total minus
actually reading the book was four and a half hours.  One hour of that was
spent in recognition that I did while eating and doing laundry.  Probably
not necessary to count that, laugh.  But I included it anyway. So, here you
go.

The book had 292 pages including back cover, jacket flaps, preliminary
pages, and, of course, the text of the book. I'll tell you in general what I
did, and how long it took, then elaborate on the particular process. As has
been said before, not everyone's process is the same, and there are probably
at least three ways to achieve any given result.  This is what I did with
this particular book, and my process might vary slightly from book to book,
but here is what I did.

I was using Kurzweil 1000 and one of the find and replace parts can easily
be done in Microsoft word.  In Kurzweil the paragraph mark is represented by
\n.  In word, that character is represented in the find and replace dialogue
by ^p.  That might help folks validating using Word instead of Kurzweil.

1.
Scan took 90 minutes
I am using an opticBook 3600 scanner in single-page mode.  Scanner settings
are as follows:
Scan to images, automatic page orientation, gray-scale data, resolution at
300 DPI.
Recognition settings were:
Collumn identification disabled, one page recognized per scan, speckle
removal disabled, Text quality is normal, partial collumns kept, suspicious
regions kept, blank pages kept, recognition engine is FineReader 8.0,
English will be recognized.
Reading settings:
Line endings will be ignored by the editor and tables will not be
identified.
I do not identify tables in straight fiction because junk sometimes scans as
a table and is more of a pain to remove that way, more time consuming.  I
have to know when I'll need table recognition so I can enable it.
While scanning to images, I am always reading another book that I have run
through this process to catch errors that ranked spelling didn't.

2.
Recognize images took 1 hour.
I do this when off eating, or doing laundry, or sleeping, something that
doesn't require my computer to be doing anything else.
This time may vary a lot depending upon how hardy your computer is, or how
lame mine is.

3.
Save the file under the name of the book. No time taken.

4.
Clean up preliminary pages and confirm accurate page count: 15 minutes
Label: [From The Back Cover] [From The Front Flap] [From The Back Flap][This
Page is blank.] if any blank pages exist. Read through all preliminary pages
and correct all scannos.  
Determine where the publisher thought page one should go and set an
opperator defined page number there as page 1.
Check that the last page in the book is numbered properly, telling you that
you do not have any missing or duplicated pages. If the numbers don't match,
either rescan and insert pages that you missed, or delete duplicated pages. 

5.
Remove headers, protect chapter headings, number and label any blank pages,
get rid of end-of-line hyphens, and ensure that blank lines at the tops of
pages will be preserved: 30 minutes.
Protect all chapter headings by placing the page number followed by a blank
line above the chapter heading.  
Remove all headers.  Do this only after protecting chapter headings, as very
often the absence of a running header is the only indication of where a
poorly scanned chapter heading should go.
Page down through the document numbering and labeling all blank pages, and
looking at the first word on each page to be sure that it is a complete
word, and reconnect hyphenated words on one page.
On each page beginning with a lower case letter, insert a space before that
initial lower case letter.  This will help later.

6.
Insert page numbers at the tops of all pages: 30 minutes.
Delete all page numbers at the bottom of pages.  These don't always scan at
all, so can't be counted upon to be there in the page numbering for daisy
navigation, and especially in the html of the Bookshare final copy in the
collection.
Insert page numbers at the tops of all pages not already numbered above
chapter headings followed by two carriage returns.
Remove all extra blank lines by  using the find and replace dialogue as
follows:
In the "find box" insert \n\n\n\n\n\n (\n is the character string that will
search for a carriage return.) In the replace box type\n\n Do this with the
replace box remaining the same, but with five, then four, then three
carriage return symbols each successive time in the "find" box.  This will
get rid of all instances of more than one blank line between any blocks of
text, or between page numbers and chapter headings or text on a page.
 
7.
Remove any extra carriage returns inadvertently inserted by the OCR: 5
minutes.
This involves using the find and replace command 27 times.
In the find box type " " (That is quotation mark followed by space followed
by quotation mark."
In the replace box type "\n"
This will separate any paragraphs between speakers that might not have been
separated by the OCR program. This does happen regularly.
Now you are going to look for paragraph marks that shouldn't be there.
You will do this with each letter of the alphabet in lower case.
In the find box type\na (That is backslash followed immediately by the lower
case letters n and a) In the replace box type space a that is hit the space
bar followed immediately by the lower case letter a Replace all.
Inserting a space at the tops of pages before each occurring lower case
letter allows your carefully inserted blank lines between page numbers and
text on the page to be preserved now.

8.  Make sure that al elipsis are three periods with no space between them.
If this is not done, the ellipsis will not be represented properly in
braille.

9.
Run ranked spelling: This took 20 minutes with this book.
I started out with a 99.28% accuracy rating.
Correct all scannos as ranked spelling or the spell checker finds them.

10.
At this point I read the book and correct any errors that the spell checker
or ranked spelling didn't find.  Hopefully I catch them all.

11.
Convert to rtf and close the file. No time taken.

12.
In Microsoft Word, Protect page numbers and page breaks, standardize fonts
and margins, and convert em dashes to double hyphens: 5 minutes. (This is a
generous estimate of how much time taken).
Open the file in microsoft word.
Standardize font and justify margins
Make sure if validating someone else's submission that there are no smart
quotes in the document, making sure that all quotation marks are standard
quotes.  Open book tends to produce inaccurate quotation marks in my
experience.
Protect page numbers and page breaks by using the find and replace dialogue
as follows:
In the find box type: ^m
In the replace type: ^p^m^p
Replace all.
Convert em dashes to double hyphens by using the find and replace dialogue
as follows:
In the find box type: ^+
In the Replace box type: -- (That is two hyphens or two dashes, depending
upon what you call that key to the right of the zero on the number row.)
Save the file.
NOW YOU'RE DONE!

 To unsubscribe from this list send a blank Email to
bksvol-discuss-request@xxxxxxxxxxxxx
put the word 'unsubscribe' by itself in the subject line.  To get a list of
available commands, put the word 'help' by itself in the subject line.

 To unsubscribe from this list send a blank Email to
bksvol-discuss-request@xxxxxxxxxxxxx
put the word 'unsubscribe' by itself in the subject line.  To get a list of
available commands, put the word 'help' by itself in the subject line.

 To unsubscribe from this list send a blank Email to
bksvol-discuss-request@xxxxxxxxxxxxx
put the word 'unsubscribe' by itself in the subject line.  To get a list of 
available commands, put the word 'help' by itself in the subject line.

Other related posts: