I will not get into the self-validating debate. But I suggest that both scanners and validators go through a set of quality checks that are as mechanized as possible, to avoid the 'my book' vs. 'someone else's' book problem. I found that the following set of checks tend to generate rather high results: 1. Page Integrity Check: sample every 20 pages. 2. Chapter header check: works best if the book uses the word 'chapter' or something else to search for. 3. Page header check: lately I have been stripping headers manually and leaving only page numbers. Definitely tedious, as I do it on each and every page. 4. during the previous step I also check for the first word of each page. If the word looks like a fragment, I merge it with the last word on the previous page if appropriate. 5. Search for short dash followed by space, and short dash followed by newline. These will let you find all sorts of words that were split at end of lines or at end of pages and can be repaired. 6. Search for the tab char (\t in k1k). This is most often a junk char, especially abundant at the beginning and end of lines. You will frequently find it associated with other junk chars, or single alphabets that had no business being there. Remove manually each occurrence of these clustered nasty things as appropriate. 7. Junk char hunt. Look for junk chars available from the keyboard. Start from the top left of the keyboard and work your way down to the bottom right. Remove or repair manually as required. Jim Pardee also suggested we keep a file containing those chars that are not keyboardable: we can copy/paste them in the find dialogue to search for them in the document. 9. Look for whole words consisting of digit '1'. In many cases you should change them to 'I'. Sometimes they are to be deleted. Do each change manually as appropriate. 10. look for digit 1 followed by an apostrophes: in most cases that should be changed to I followed by apostrophes. 11. look for the apostrophes followed by digit 1. In most cases that is part of a '11, which should become an 'll. 12. Do a mass replacement of double single quote with single double quote. 13. Remove/fix single alphabetic words: start with 'b'. Search should be capitalization insensitive, except for 'i' which should be searched in lower case only. Delete or repair each occurrence manually as appropriate. Be careful, you may be deleting someone's middle initial. 14. Spell check: This step should remove most residual problems, except for some scanos that have generated valid English words. Hope this helps. Guido D. Corona IBM Accessibility Center, Austin Tx. IBM Research, Phone: (512) 838-9735 Email: guidoc@xxxxxxxxxxx Visit my weekly Accessibility WebLog at: http://www-3.ibm.com/able/weblog/corona_weblog.html Nolan Crabb <aa3go@xxxxxxxxxxx> Sent by: bksvol-discuss-bounce@xxxxxxxxxxxxx 06/18/2004 12:23 PM Please respond to bksvol-discuss To bksvol-discuss@xxxxxxxxxxxxx cc Subject [bksvol-discuss] Re: Self-validation Rui warned against the urge to self validate. I completely concur! I've made a living editing my work and that of others for years. The harsh truth is, your own errors are much easier to miss, even if you've let that book sit up there and cool its digital heels for days. I guess the urge to self validate is a natural one, since people get submission credits and such that will help them pay for next year's subscription. I have a Kate Wilhelm mystery that's been up there for some time now, and I want very much to just validate the thing and get the credits and more importantly, get it up there so others can enjoy it. But I won't. I'm too aware of how easy it is to skip errors in things you've either written or read. I think the checks and balances that exist here--the ones that encourage others to validate what you've submitted--are the way it should work. I realize others will challenge my position, suggesting that self validation is absolutely the only way some of the more esoteric titles will get approved. I disagree. The first book I ever validated was a Christian romance--decidedly not, not, not something I would normally want to read under any circumstances. Oddly enough, that's precisely the reason I chose it. I figured the material would be so new and different to me that I'd be more prone to catch errors. That book entered the Bookshare system with a "good" rating presumably provided by the submitter. I spent some time with the book, but today it carries an "excellent" rating, and it's now part of the collection. Please try not to misinterpret this, folks. I don't use it as an example to demonstrate how amazing I am. Very nearly all of you have been at the submission and validation end of this far longer than have I, and you're doubtless the ultimate experts, having forgotten more in a day than I will learn in years. I just find self validation a little scary, especially in light of rather strong messages lately which have called for higher quality scans and validations. There's no doubt we achieve higher quality validations if we don't do them ourselves. The quarterly magazine I edit goes through no fewer than four different edits before it ever sees the inside of our subscribers' mailboxes. I'm not advocating for absolute rigid perfection; we are volunteers, after all, who have lives. But self validation is an excellent way to increase the number of potential errors into the system. So that I don't totally come across here as being the loud mouthed whiner on the list, here's a little proposal: If you have a book that's been up there quite a while, I'll take yours and validate it, regardless of the subject or whatever, if you take mine and get it approved. It's called "The Casebook of Constance and Charlie Vol. 1," and it's 614 pages, so I'm sure that's discouraged more than one person from taking it. Obviously, this is one of those first-come first-accepted challenges. <smile> Again, I'm not desirous of offending any here. But in light of recent messages that have called for higher standards in terms of better quality scans and better validations, redoubling our resolve to let others validate our work is probably one good way to ensure the increased quality of the collection. Best Regards, Nolan, who is dawning his fire-retardant e-mail-reading suit in preparation for all that indignant mail from self validators :-)