[bksvol-discuss] Re: Editing books -- some techniques

  • From: Guido Corona <guidoc@xxxxxxxxxx>
  • To: bksvol-discuss@xxxxxxxxxxxxx
  • Date: Thu, 30 Sep 2004 11:08:52 -0500

Lisa, as Donna already said,  it takes no more than 90 minutes to scan 
even a relatively sizable book of 400 pages from front to back.  Fixing a 
book in bad shape is an open-ended proposition which can take from just a 
couple of hours to. . . the sky is the limit. 

having said that, When performing quality checks on a book I go through 
the following steps.  Please note that not every step is applicable to 
every book.


0.  Do not remove blank pages beyond the front matter.  Blank pages count 
as pages and their removal can screw up the page integrity check below.

1.  Page Integrity Check:

This essentially means attempting to match the page number at the  top of 
bottom of recognized pages with the pages reported by your text processing 
program.  This is especially easy in Kurzweil 8 or 9.

a.  Find in the book what should be page 1.  Bear in mind that sometimes 
you need to guess a little because you may not find a page number for page 
1. 
b.  Assign this page to be page 1 in the document. In Kurzweil 9.0 you can 
do this in navigation/set user page number.
Now page numbers at the top or bottom of each page should match what is 
reported by Kurzweil 
c.  Go to page 20.  In Kurzweil 9 you can go to a specific page from the 
navigation menu.
d.  Does the page number reported by Kurzweil match the page at the 
bottom/top of the page?  If pages match, repeat the check on page 40, 80, 
100, etc... until the end of the book.
If there is a mismatch,  you have either missing pages or duplicate pages.

e.  If Kurzweil reported you are on page 20 but the page number at the 
top/bottom of the pages is smaller than that, e.g. 18,  you have some 
duplicate pages between pages 1 and 20.  Go back 10 pages and check again. 
 If pages match now the duplicates must be between pages 10 and 20.  If 
pages still do not match the problem is between pages 1 and 10.  split the 
difference again in the direction of the problem pages until you find the 
duplicates.  This is called a binary search technique.  Its strength is in 
the fact that you could find  what you are looking for in very few steps. 
Eventually you will discover where you a double copy of two pages. 
Delete the first copy of the duplicate pages.


f.  If instead when looking at page 20,  you find that the page number at 
the top/bottom of the page is 22,  this means that you are missing two 
pages.  Use the binary search technique outlined above to find what pages 
are missing.  If you have a copy of the book, use scan/insert scan in 
Kurzweil to scan and insert the missing pages in the correct spot.  If you 
do not have a copy of the book take a note of the missing pages.  If at 
the end of your check you find lots of missing pages reject the book 
without feeling guilty:  as Donna said,  submitters are responsible for 
ensuring their books are complete,  not reviewers like you.

g.  Do not worry: the above sounds complicated, but in reality page 
integrity checks run very quickly even for very long books.
 

2. Chapter header check

This does not apply to every book, because not every book has clear 
chapter headers.  Typically you search for the word chapter in the whole 
book and ensure that there are no terrible spelling mistakes in the 
chapter title.  It is best to put the word chapter and the title of each 
chapter ion separate lines, e.g. lines 2 and 3 of the page.  This is to 
avoid the automatic Bookshare header stripper from removing chapter 
headers.
 
3. Page header:

In some cases you can strip all page headers from a book using the 
automatic header stripper in Kurzweil.  You will find this utility in 
file/utilities/remove page headers.  For best results use the option 
called 'carefully'.
Automatic header removal tools do not ketch every header.  After automatic 
removal I usually page through the whole book to ensure that all page 
headers have been removed correctly.  Remove manually those headers that 
are still there.
note that in most cases it is perfectly OK to remove page numbers after 
you have ensured    that the book is complete without pages missing or 
duplicated.  But for academic type books it is best to leave page numbers 
in the book for easier student reference.  In this case you should not use 
automated header strippers, but remove headers manually and leave page 
numbers in th book.

4. Remove tab chars:

In kurzweil you can search for tab characters by typing \t in the search 
field in edit/search.
Remove tab characters manually.  You will very often find there are junk 
characters immeditely to the left or the right of the tab.  remove those 
manually as well, or replace them with the appropriate character if 
required in a misspelled word.
 

5. Remove / correct junk chars:

Junk chars are those that have usually no business being in a book.  I 
usually go through most special symbols generated by the keyboard.  I 
search methodically for every occurrence of each in the book and take 
appropriate action:  remove or replace with the correct character.
 
6. Check for odd combinations of digit 1, like 111 instead of "I'll":
Also check for the single character word 1.  In many cases this should be 
replaced manually with the word "I".

7. Remove/fix single alphabetic words:
In most cases single chars b through z are not valid words.  Search for 
each and take appropriate action on individual bases.  Note that for 'i' 
you should search for lower case only.

8.  Spell check:

Not much to say about this step,  except it may take several hours per 
book.

Guido

Guido D. Corona
IBM Accessibility Center,  Austin Tx.
IBM Research,
Phone:  (512) 838-9735
Email: guidoc@xxxxxxxxxxx

Visit my weekly Accessibility WebLog at:
http://www-3.ibm.com/able/weblog/corona_weblog.html





"Lisa Leonardi" <lml5280@xxxxxxxxxxx> 
Sent by: bksvol-discuss-bounce@xxxxxxxxxxxxx
09/29/2004 09:15 PM
Please respond to
bksvol-discuss


To
<bksvol-discuss@xxxxxxxxxxxxx>
cc

Subject
[bksvol-discuss] Editing books






Hi, my name is Lisa and I am a new volunteer.  I've noticed that many of 
the books that I download for validation seem to be of fair quality.  i 
would like to edit them to improve this.  Does anyone have any tips on 
doing this? It seems a bit time-consuming to read the entire book to check 
for errors.  Any suggestions?

Other related posts:

  • » [bksvol-discuss] Re: Editing books -- some techniques