[bksvol-discuss] Re: Become A Black Belt Submitter

  • From: "Jackie McBride" <abletec@xxxxxxxxx>
  • To: bksvol-discuss@xxxxxxxxxxxxx
  • Date: Fri, 15 Aug 2008 07:12:32 -0700

Chella, sorry--I obviously misunderstood your question. The concept of
resolution is software independent. DPI stands for "dots per inch" &
is a measure of the amount of spacial density, e.g., the number of
dots that occur in an inch of the particular image. We talk about this
when we discuss printers, scanners, & video, so, as u can see by this,
this is not a concept related to any piece of software. Text is
usually best scanned at 300 DPI, though if the text is small, we often
crank the resolution up to 400 DPI to see if we can't get a better
image. Some folks mistakenly think that the higher the resolution, the
better. With color photos & such, that tends to be true, but not with

I do hope that clarifies, ask again if not, & sorry I initially
misunderstood what u were asking.

On 8/15/08, james.homme@xxxxxxxxxxxx <james.homme@xxxxxxxxxxxx> wrote:
> Hi,
> Where is the stuff Pratik' wrote about this?
> Thanks.
> Jim
> James D Homme, Usability Engineering, Highmark Inc.,
> james.homme@xxxxxxxxxxxx, 412-544-1810
> "The difference between those who get what they wish for and those who
> don't is action. Therefore, every action you take is a complete
> success,regardless of the results." -- Jerrold Mundis
> Highmark internal only: For usability and accessibility:
> http://highwire.highmark.com/sites/iwov/hwt093/
>              "EVAN REESE"
>              <mentat3@verizon.
>              net>                                                       To
>              Sent by:                  bksvol-discuss@xxxxxxxxxxxxx
>              bksvol-discuss-bo                                          cc
>              unce@xxxxxxxxxxxx
>              g                                                     Subject
>                                        [bksvol-discuss] Re: Become A Black
>                                        Belt Submitter
>              08/14/2008 07:27
>              PM
>              Please respond to
>              bksvol-discuss@fr
>                 eelists.org
> Thanks for sending this up. This is all very useful stuff.
> I do use Scan Repeatedly, and just hit the Cancel key twice if I get a
> confidence number below the threshhold - which on my K1000 is set to %98.7.
> If I can go twenty or fifty pages without getting a page below that number,
> then it saves me from having to hit the F9 key twenty or fifty times.
> I also use autocorrection, but haven't compared a scan with and without it,
> so I cannot take sides in that debate.
> According to Pratik's excellent monograph on getting the best recognition
> of mass market paperbacks, he wrote that grayscale and 400 dots per inch
> can sometimes produce better results than static optimized. So your point
> here about grayscale is a good one, but increasing the resolution from 300
> to 400, especially for poor quality print such as you'd get with cheap
> paperbacks can give even better recognition sometimes. Of course,
> increasing the resolution from the usual 300 will also slow down the scan
> and the recognition; but the extra time invested up front is very likely to
> be more than offset by the time saved cleaning up the scan afterword.
> I have scanned the same material with Suspicious Regions kept and ignored,
> and it can really make a difference in the amount of junk you get. So this
> is another good point you make here.
> Thanks again.
> Evan
>  ----- Original Message -----
>  From: Monica Willyard
>  To: Bookshare Volunteers
>  Sent: Thursday, August 14, 2008 6:19 PM
>  Subject: [bksvol-discuss] Become A Black Belt Submitter
>  Hi, everyone. I wrote an email about getting really clear scans for one of
>  our volunteers, and it occurred to me that someone on this list might
>  benefit from it. It's a little on the long side. I hope something in it
>  will help you. If I've said anything confusing, please ask me about it. I
>  know many of you have done a lot of scanning, so I'm focusing on things
>  that may not have occurred to you. I'll call them my top ten scanning
>  tips. (grin) They work from my experience, and you may find that you need
>  to experiment to find something that works well for you. Also, I use
>  Kurzweil for scanning. Openbook users may find some of this to be useful,
>  but some of it won't apply. I do have Openbook 7 and used it for several
>  years. So I'll do my best to help you translate these to Openbook if
>  that's what you need.
>  I got a lot of these ideas from volunteers I've been fortunate enough to
>  work with over the past 2 years. Jim Baugh, Louise, Pratik, Jake, Scott,
>  Shelley, and Gerald taught me so much about good scanning. Thanks guys.
>  (smile) You rock!
>  1. Start with some solid settings in Kurzweil that will work most of the
>  time. You may  know your way around Kurzweil well. I don't know if you've
>  thought to work on these settings though since they're not obvious. Under
>  the settings menu, in the general tab, make sure that your confidence
>  threshold is set to at least 98.5. Why? Kurzweil defaults to 95 percent,
>  and that means that it optimizes scans for a lower level of accuracy. That
>  means you won't get the best results from optimization. That also means
>  more clean-up on the backside, and that's a pain in the neck. The other
>  setting in general that you may want to turn on if you have some disk
>  space is the option to keep scanned images. This feature lets you
>  re-recognize pages if they have issues. Sometimes just changing something
>  like detect columns will make that page come out right without you having
>  to totally rescan the page. Once you've read through the book, Kurzweil
>  will let you remove the scanned images from the book to reduce the file
>  size.
>  There are three final settings that you may find useful for scanning most
>  fiction. These work well for me, especially with library books. They're
>  all under the recognition tab. Column identification should be enabled.
>  Partial columns should be ignored, and suspicious regions should be
>  ignored. This flies in the face of what Nick has recommended on the
>  Kurzweil list, so I'd better explain. When scanning books, it's somewhat
>  common to get a shadow from the spine of the book. It often makes a narrow
>  column of a tab character and a random group of numbers or letters. If you
>  turn off column identification, these random letters are mingled with the
>  regular text. Turning on the column detection separates this garbage from
>  the text, and ignoring partial columns and suspicious regions removes it
>  during OCR. If a page needs column detection turned off due to a table,
>  and you have retained images of the scanned page, you can easily change
>  the recognition settings and just re-recognize the page from the scanned
>  image. Do you see how this could save you time and hassle?
>  Once you have settings you like, save them as default so you can start
>  scanning without worrying about them each time you start Kurzweil.
>  2. Prepare your book for scanning, and you'll get better results from the
>  start. Before you begin to scan a book, run your fingers lightly through
>  the pages to remove any possible ink ,dust, or other particles that may be
>  on the pages. If the book is a library book, flip through the book in
>  sections of about fifteen pages or so, gently pressing your fingers along
>  the inner spine to encourage the book to lie flat. If the book belongs to
>  you, especially if its a paperback, flip through sections as with a
>  library book, but bend the book back so that it's outer covers almost
>  touch. You're giving your book some flexibility stretches while not
>  breaking its spine. This is especially important for thick books or
>  two-page scanning mode and will keep you from having to push down as hard
>  on books while you scan.
>  3. Optimize and verify settings for your book. Before scanning a book,
>  open to the center and use the optimize feature. The Kurzweil staff says
>  that optimization should be used in one-page mode so it can get the best
>  idea of how the print works in your book. Scan four or five pages after
>  optimization to determine if any adjustments in settings need to be made.
>  Kurzweil does a fairly good job picking the optimal settings to scan a
>  particular book unless the print quality is exceptionally bad. If you're
>  planning to scan in two-page mode, you can turn this back on once you're
>  finished with optimization.
>  4. When in doubt, go for grey-scale. Grey-scale is the best and most
>  reliable thing to try when optimization doesn't produce the quality that
>  you need. Try grey-scale with brightness of around 65 and a resolution of
>  300 DPI. It's really great for scanning mass market paperbacks. Grey-scale
>  will make your scans slower, and its scanned images are larger than those
>  made with static thresholding. It gives the best page representation
>  though, compared to other forms of thresholding. If you're using a Canon
>  or Visioneer scanner, grey-scale will save your bacon! (grin) Please note
>  that Openbook 7 doesn't implement grey-scale correctly, so automatic
>  contrast is probably your best choice.
>  5. Catch bad scans as they happen. There is a friendly debate among
>  submitters about whether to scan in batches or to scan pages and recognize
>  them one at a time. There are pros and cons on both sides. I do a sort of
>  modified batch style. I scan a book while on the phone or doing something
>  else but don't use the scan repeatedly feature for one reason. I want to
>  catch badly scanned pages as they happen. It saves me from hunting for a
>  page to rescan it later. So I scan a page and let my scan recognize while
>  I'm turning to the next page. I wait for Kurzweil to tell me its
>  confidence number. I make this really easy because I've turned off the
>  progress messages for Kurzweil's scanning and recognition and have it set
>  to play a chime when scanning and recognition are finished. So if Kurzweil
>  says something, it's the confidence number letting me know that the page
>  scanned below the accuracy threshold I've set. If the statistics say 97
>  percent confidence level or less, rescan the page to try for a better
>  scan. Otherwise, you will have to struggle with many errors on the page.
>  6. Your scanner needs TLC too. Books can be dirty or dusty sometimes. Mass
>  market paperbacks can leave a residue of ink dust on your scanner. Keep
>  the scanner glass clean by using a dry, lint-free cloth. Never use
>  anything wet like an alcohol pad or baby wipe. That will create little
>  bubbles under the scanner glass and will cause problems in future scans.
>  7. When scanning a book, do a spot check every 15 or 20 pages. Look at the
>  last page or two of the file to make sure the settings are still producing
>  accurate results.
>  8. After doing a scan, run rank spelling. It will let you see your
>  spelling errors and will put them in the order of their prevalence in your
>  scan. If you find some words that Kurzweil doesn't know, you may want to
>  add them to your word list so they won't be flagged in future scans. I
>  don't do this for proper names unless its a name that will keep cropping
>  up in future books. I do add words that are valid but that Kurzweil
>  doesn't have in its internal word list. You'll find that doing this over
>  time helps Kurzweil do a better job for you when you're cleaning up your
>  scans.
>  9. Keep the de-speckle setting turned off for most books. You may need it
>  with hardcover books because they sometimes have a text decoration on the
>  pages. Otherwise, de-speckle can interfere with OCR and actually cause
>  more errors than it solves.
>  10. The issue of using auto-corrections when scanning is another issue
>  where there is debate. I believe it can be a good thing if used carefully.
>  I should note that Gerald has pointed out that Openbook has some
>  auto-corrections that cause problems with books and should be fixed by
>  users of that program. Kurzweil seems to do a good job for me, and it
>  makes my work easier. I loaded up a bunch of my older scans that have been
>  lurking on my hard rive for over a decade and ran auto-correction on them.
>  What an improvement! I might actually get to submit some of them now. Here
>  are a few auto-corrections I have added to my Kurzweil list.
>  dirough for through
>  diough for though
>  diought for thought
>  diey for they
>  diere for there
>  dieir for their
>  cornpany for company
>  cornfortable for comfortable
>  tiiing for thing
>  rnany for many
>  anydiing for anything
>  If you use Openbook, you may want to remove a few of the corrections in
>  its default list. I regularly find these in books scanned in Openbook and
>  have to fix them as I read.
>  modem for modern
>  torn for tom
>  glock for clock
>  morn for mom
>  bum for burn
>  corn for com
>  That last one causes problems for anyone scanning Star Trek books because
>  Kirk presses his corn badge to talk to the ship. (grin) If a word like
>  command is hyphenated between two pages, you get corn-mand. Meanwhile,
>  Batman dials into the internet with his modern, tries to stop a crook
>  named torn from shooting him with a clock, and puts the dirty burn in
>  cuffs until mom-ing. See how auto-corrections can go wrong if you're not
>  careful?
>  Whew! We've made it to the end. (grin) I hope some of this makes your
>  scans easier to work with. It'll give you a foundation to start from
>  anyhow. Clean-up tips will be another email and will take some thought.
>  I'm better at doing than explaining things. I do have a system I use
>  though. I just haven't really written it down. Anyone got a cold Dr.
>  Pepper to share?
>  --
>  Monica Willyard
>  To unsubscribe from this list send a blank Email to
> bksvol-discuss-request@xxxxxxxxxxxxx
> put the word 'unsubscribe' by itself in the subject line.  To get a list of
> available commands, put the word 'help' by itself in the subject line.

Change the world--1 deed at a time
Jackie McBride
Check out my homepage at:
& please join my fight against breast cancer
 To unsubscribe from this list send a blank Email to
put the word 'unsubscribe' by itself in the subject line.  To get a list of 
available commands, put the word 'help' by itself in the subject line.

Other related posts: