[bksvol-discuss] Quality checks procedure -- was WRe: Re: Self-validation

  • From: Guido Corona <guidoc@xxxxxxxxxx>
  • To: bksvol-discuss@xxxxxxxxxxxxx
  • Date: Fri, 18 Jun 2004 13:55:53 -0500

I will not get into the self-validating debate.  But I suggest that both 
scanners and validators go through a set of quality checks that are as 
mechanized as possible,  to avoid the 'my book' vs. 'someone else's' book 
problem.  I found that the following set of checks tend to generate rather 
high results:


1.  Page Integrity Check:  sample every 20 pages.

2. Chapter header check:  works best if the book uses the word 'chapter' 
or something else to search for.

3. Page header check: lately I have been stripping headers manually and 
leaving only page numbers.  Definitely tedious, as I do it on each and 
every page. 

4.  during the previous step I also check for the first word of each page. 
 If the word looks like a fragment,  I merge it with the last word on the 
previous page if appropriate.

5.  Search for short dash followed by space, and short dash followed by 
newline.  These will let you find all sorts of words that were split at 
end of lines or at end of pages and can be repaired.
 
6.  Search for the tab char (\t in k1k).  This is most often a junk char, 
especially abundant at the beginning and end of lines.  You will 
frequently find it associated with other junk chars,  or single alphabets 
that had no business being there.  Remove manually each occurrence of 
these clustered nasty things as appropriate.
 
7.  Junk char hunt.  Look for junk chars available from the keyboard. 
Start from the top left of the keyboard and work your way down to the 
bottom right.  Remove or repair manually as required.
Jim Pardee also suggested we keep a file containing those chars that are 
not keyboardable:  we can copy/paste them in the find dialogue to search 
for them in the document.

9.  Look for whole words consisting of digit '1'.  In many cases you 
should change them to 'I'.  Sometimes they are to be deleted.  Do each 
change manually as appropriate.

10.  look for digit 1 followed by an apostrophes:  in most cases that 
should be changed to I followed by apostrophes.

11.  look for the apostrophes followed by digit 1.  In most cases that is 
part of a '11, which should become an 'll.

12.  Do a mass replacement of double single quote with single double 
quote.

13. Remove/fix single alphabetic words:  start with 'b'.  Search should be 
capitalization insensitive, except for 'i' which should be searched in 
lower case only.  Delete or repair each occurrence manually as 
appropriate.  Be careful,  you may be deleting someone's middle initial.

14. Spell check:  This step should remove most residual problems, except 
for some scanos that have generated valid English words. 

Hope this helps.

Guido D. Corona
IBM Accessibility Center,  Austin Tx.
IBM Research,
Phone:  (512) 838-9735
Email: guidoc@xxxxxxxxxxx

Visit my weekly Accessibility WebLog at:
http://www-3.ibm.com/able/weblog/corona_weblog.html





Nolan Crabb <aa3go@xxxxxxxxxxx> 
Sent by: bksvol-discuss-bounce@xxxxxxxxxxxxx
06/18/2004 12:23 PM
Please respond to
bksvol-discuss


To
bksvol-discuss@xxxxxxxxxxxxx
cc

Subject
[bksvol-discuss] Re: Self-validation






Rui warned against the urge to self validate.  I completely concur!

I've made a living editing my work and that of others for years.  The 
harsh 
truth is, your own errors are much easier to miss, even if you've let that 

book sit up there and cool its digital heels for days.  I guess the urge 
to 
self validate is a natural one, since people get submission credits and 
such that will help them pay for next year's subscription.  I have a Kate 
Wilhelm mystery that's been up there for some time now, and I want very 
much to just validate the thing and get the credits and more importantly, 
get it up there so others can enjoy it.  But I won't.  I'm too aware of 
how 
easy it is to skip errors in things you've either written or read.  I 
think 
the checks and balances that exist here--the ones that encourage others to 

validate what you've submitted--are the way it should work.  I realize 
others will challenge my position, suggesting that self validation is 
absolutely the only way some of the more esoteric titles will get 
approved.  I disagree.  The first book I ever validated was a Christian 
romance--decidedly not, not, not something I would normally want to read 
under any circumstances.  Oddly enough, that's precisely the reason I 
chose 
it.  I figured the material would be so new and different to me that I'd 
be 
more prone to catch errors.  That book entered the Bookshare system with a 

"good" rating presumably provided by the submitter.  I spent some time 
with 
the book, but today it carries an "excellent" rating, and it's now part of 

the collection.

Please try not to misinterpret this, folks.  I don't use it as an example 
to demonstrate how amazing I am.  Very nearly all of you have been at the 
submission and validation end of this far longer than have I, and you're 
doubtless the ultimate experts, having forgotten more in a day than I will 

learn in years.  I just find self validation a little scary, especially in 

light of rather strong messages lately which have called for higher 
quality 
scans and validations.  There's no doubt we achieve higher quality 
validations if we don't do them ourselves.

The quarterly  magazine I edit goes through no fewer than four different 
edits before it ever sees the inside of our subscribers' mailboxes.  I'm 
not advocating for absolute rigid perfection; we are volunteers, after 
all, 
who have lives.  But self validation is an excellent way to increase the 
number of potential errors into the system.

So that I don't totally come across here as being the loud mouthed whiner 
on the list, here's a little proposal:  If you have a book that's been up 
there quite a while, I'll take yours and validate it, regardless of the 
subject or whatever, if you take mine and get it approved.  It's called 
"The Casebook of Constance and Charlie Vol. 1," and it's 614 pages, so I'm 

sure that's discouraged more than one person from taking it.  Obviously, 
this is one of those first-come first-accepted challenges. <smile>

Again, I'm not desirous of offending any here.  But in light of recent 
messages that have called for higher standards in terms of better quality 
scans and better validations, redoubling our resolve to let others 
validate 
our work is probably one good way to ensure the increased quality of the 
collection.

Best Regards,

Nolan, who is dawning his fire-retardant e-mail-reading suit in 
preparation 
for all that indignant mail from self validators :-)



Other related posts: