[bksvol-discuss] Re: Become A Black Belt Submitter

  • From: "Paula and James Muysenberg" <outofsightlife@xxxxxxxxx>
  • To: "Bkvol" <bksvol-discuss@xxxxxxxxxxxxx>
  • Date: Fri, 15 Aug 2008 21:08:29 -0500

Hi, Julia,

    I was sure this setting was still somewhere in Version 11, but I just
looked and can't find it. I searched the manual, and also found nothing
about it. I wonder if the program somehow doesn't need it anymore.

Paula

----- Original Message ----- 
From: "Julia Kulak" <julia.kulak@xxxxxxxxxxxx>
To: <bksvol-discuss@xxxxxxxxxxxxx>
Sent: Friday, August 15, 2008 8:18 PM
Subject: [bksvol-discuss] Re: Become A Black Belt Submitter


> Hi, I think Kurzweil eliminated one setting in version 11, there doesn't
> appear to be a setting that has recognition of light text on a dark
> background. Will this mess up the book? Should I downgrade to version 10
for
> this feature, and is there an equivalent setting in version 11?
> Julia
> ----- Original Message ----- 
> From: "EVAN REESE" <mentat3@xxxxxxxxxxx>
> To: <bksvol-discuss@xxxxxxxxxxxxx>
> Sent: Friday, August 15, 2008 2:40 PM
> Subject: [bksvol-discuss] Re: Become A Black Belt Submitter
>
>
> > Hi Jim, I have a personal copy; but you, and anyone else here, can find
> > the article on Jake's site at:
> >
> > http://www.jbrownell.com/bks/tip.asp?id=29
> >
> > Evan
> >
> > ----- Original Message ----- 
> > From: <james.homme@xxxxxxxxxxxx>
> > To: <bksvol-discuss@xxxxxxxxxxxxx>
> > Sent: Friday, August 15, 2008 7:49 AM
> > Subject: [bksvol-discuss] Re: Become A Black Belt Submitter
> >
> >
> >> Hi,
> >> Where is the stuff Pratik' wrote about this?
> >>
> >> Thanks.
> >>
> >> Jim
> >>
> >> James D Homme, Usability Engineering, Highmark Inc.,
> >> james.homme@xxxxxxxxxxxx, 412-544-1810
> >>
> >> "The difference between those who get what they wish for and those who
> >> don't is action. Therefore, every action you take is a complete
> >> success,regardless of the results." -- Jerrold Mundis
> >> Highmark internal only: For usability and accessibility:
> >> http://highwire.highmark.com/sites/iwov/hwt093/
> >>
> >>
> >>
> >>             "EVAN REESE"
> >>             <mentat3@verizon.
> >>             net>
To
> >>             Sent by:                  bksvol-discuss@xxxxxxxxxxxxx
> >>             bksvol-discuss-bo
cc
> >>             unce@xxxxxxxxxxxx
> >>             g
Subject
> >>                                       [bksvol-discuss] Re: Become A
Black
> >>                                       Belt Submitter
> >>             08/14/2008 07:27
> >>             PM
> >>
> >>
> >>             Please respond to
> >>             bksvol-discuss@fr
> >>                eelists.org
> >>
> >>
> >>
> >>
> >>
> >>
> >> Thanks for sending this up. This is all very useful stuff.
> >>
> >> I do use Scan Repeatedly, and just hit the Cancel key twice if I get a
> >> confidence number below the threshhold - which on my K1000 is set to
> >> %98.7.
> >> If I can go twenty or fifty pages without getting a page below that
> >> number,
> >> then it saves me from having to hit the F9 key twenty or fifty times.
> >>
> >> I also use autocorrection, but haven't compared a scan with and without
> >> it,
> >> so I cannot take sides in that debate.
> >>
> >> According to Pratik's excellent monograph on getting the best
recognition
> >> of mass market paperbacks, he wrote that grayscale and 400 dots per
inch
> >> can sometimes produce better results than static optimized. So your
point
> >> here about grayscale is a good one, but increasing the resolution from
> >> 300
> >> to 400, especially for poor quality print such as you'd get with cheap
> >> paperbacks can give even better recognition sometimes. Of course,
> >> increasing the resolution from the usual 300 will also slow down the
scan
> >> and the recognition; but the extra time invested up front is very
likely
> >> to
> >> be more than offset by the time saved cleaning up the scan afterword.
> >>
> >> I have scanned the same material with Suspicious Regions kept and
> >> ignored,
> >> and it can really make a difference in the amount of junk you get. So
> >> this
> >> is another good point you make here.
> >>
> >> Thanks again.
> >>
> >> Evan
> >>
> >> ----- Original Message -----
> >> From: Monica Willyard
> >> To: Bookshare Volunteers
> >> Sent: Thursday, August 14, 2008 6:19 PM
> >> Subject: [bksvol-discuss] Become A Black Belt Submitter
> >>
> >> Hi, everyone. I wrote an email about getting really clear scans for one
> >> of
> >> our volunteers, and it occurred to me that someone on this list might
> >> benefit from it. It's a little on the long side. I hope something in it
> >> will help you. If I've said anything confusing, please ask me about it.
I
> >> know many of you have done a lot of scanning, so I'm focusing on things
> >> that may not have occurred to you. I'll call them my top ten scanning
> >> tips. (grin) They work from my experience, and you may find that you
need
> >> to experiment to find something that works well for you. Also, I use
> >> Kurzweil for scanning. Openbook users may find some of this to be
useful,
> >> but some of it won't apply. I do have Openbook 7 and used it for
several
> >> years. So I'll do my best to help you translate these to Openbook if
> >> that's what you need.
> >>
> >> I got a lot of these ideas from volunteers I've been fortunate enough
to
> >> work with over the past 2 years. Jim Baugh, Louise, Pratik, Jake,
Scott,
> >> Shelley, and Gerald taught me so much about good scanning. Thanks guys.
> >> (smile) You rock!
> >>
> >> 1. Start with some solid settings in Kurzweil that will work most of
the
> >> time. You may  know your way around Kurzweil well. I don't know if
you've
> >> thought to work on these settings though since they're not obvious.
Under
> >> the settings menu, in the general tab, make sure that your confidence
> >> threshold is set to at least 98.5. Why? Kurzweil defaults to 95
percent,
> >> and that means that it optimizes scans for a lower level of accuracy.
> >> That
> >> means you won't get the best results from optimization. That also means
> >> more clean-up on the backside, and that's a pain in the neck. The other
> >> setting in general that you may want to turn on if you have some disk
> >> space is the option to keep scanned images. This feature lets you
> >> re-recognize pages if they have issues. Sometimes just changing
something
> >> like detect columns will make that page come out right without you
having
> >> to totally rescan the page. Once you've read through the book, Kurzweil
> >> will let you remove the scanned images from the book to reduce the file
> >> size.
> >>
> >> There are three final settings that you may find useful for scanning
most
> >> fiction. These work well for me, especially with library books. They're
> >> all under the recognition tab. Column identification should be enabled.
> >> Partial columns should be ignored, and suspicious regions should be
> >> ignored. This flies in the face of what Nick has recommended on the
> >> Kurzweil list, so I'd better explain. When scanning books, it's
somewhat
> >> common to get a shadow from the spine of the book. It often makes a
> >> narrow
> >> column of a tab character and a random group of numbers or letters. If
> >> you
> >> turn off column identification, these random letters are mingled with
the
> >> regular text. Turning on the column detection separates this garbage
from
> >> the text, and ignoring partial columns and suspicious regions removes
it
> >> during OCR. If a page needs column detection turned off due to a table,
> >> and you have retained images of the scanned page, you can easily change
> >> the recognition settings and just re-recognize the page from the
scanned
> >> image. Do you see how this could save you time and hassle?
> >>
> >> Once you have settings you like, save them as default so you can start
> >> scanning without worrying about them each time you start Kurzweil.
> >>
> >> 2. Prepare your book for scanning, and you'll get better results from
the
> >> start. Before you begin to scan a book, run your fingers lightly
through
> >> the pages to remove any possible ink ,dust, or other particles that may
> >> be
> >> on the pages. If the book is a library book, flip through the book in
> >> sections of about fifteen pages or so, gently pressing your fingers
along
> >> the inner spine to encourage the book to lie flat. If the book belongs
to
> >> you, especially if its a paperback, flip through sections as with a
> >> library book, but bend the book back so that it's outer covers almost
> >> touch. You're giving your book some flexibility stretches while not
> >> breaking its spine. This is especially important for thick books or
> >> two-page scanning mode and will keep you from having to push down as
hard
> >> on books while you scan.
> >>
> >> 3. Optimize and verify settings for your book. Before scanning a book,
> >> open to the center and use the optimize feature. The Kurzweil staff
says
> >> that optimization should be used in one-page mode so it can get the
best
> >> idea of how the print works in your book. Scan four or five pages after
> >> optimization to determine if any adjustments in settings need to be
made.
> >> Kurzweil does a fairly good job picking the optimal settings to scan a
> >> particular book unless the print quality is exceptionally bad. If
you're
> >> planning to scan in two-page mode, you can turn this back on once
you're
> >> finished with optimization.
> >>
> >> 4. When in doubt, go for grey-scale. Grey-scale is the best and most
> >> reliable thing to try when optimization doesn't produce the quality
that
> >> you need. Try grey-scale with brightness of around 65 and a resolution
of
> >> 300 DPI. It's really great for scanning mass market paperbacks.
> >> Grey-scale
> >> will make your scans slower, and its scanned images are larger than
those
> >> made with static thresholding. It gives the best page representation
> >> though, compared to other forms of thresholding. If you're using a
Canon
> >> or Visioneer scanner, grey-scale will save your bacon! (grin) Please
note
> >> that Openbook 7 doesn't implement grey-scale correctly, so automatic
> >> contrast is probably your best choice.
> >>
> >> 5. Catch bad scans as they happen. There is a friendly debate among
> >> submitters about whether to scan in batches or to scan pages and
> >> recognize
> >> them one at a time. There are pros and cons on both sides. I do a sort
of
> >> modified batch style. I scan a book while on the phone or doing
something
> >> else but don't use the scan repeatedly feature for one reason. I want
to
> >> catch badly scanned pages as they happen. It saves me from hunting for
a
> >> page to rescan it later. So I scan a page and let my scan recognize
while
> >> I'm turning to the next page. I wait for Kurzweil to tell me its
> >> confidence number. I make this really easy because I've turned off the
> >> progress messages for Kurzweil's scanning and recognition and have it
set
> >> to play a chime when scanning and recognition are finished. So if
> >> Kurzweil
> >> says something, it's the confidence number letting me know that the
page
> >> scanned below the accuracy threshold I've set. If the statistics say 97
> >> percent confidence level or less, rescan the page to try for a better
> >> scan. Otherwise, you will have to struggle with many errors on the
page.
> >>
> >> 6. Your scanner needs TLC too. Books can be dirty or dusty sometimes.
> >> Mass
> >> market paperbacks can leave a residue of ink dust on your scanner. Keep
> >> the scanner glass clean by using a dry, lint-free cloth. Never use
> >> anything wet like an alcohol pad or baby wipe. That will create little
> >> bubbles under the scanner glass and will cause problems in future
scans.
> >>
> >> 7. When scanning a book, do a spot check every 15 or 20 pages. Look at
> >> the
> >> last page or two of the file to make sure the settings are still
> >> producing
> >> accurate results.
> >>
> >> 8. After doing a scan, run rank spelling. It will let you see your
> >> spelling errors and will put them in the order of their prevalence in
> >> your
> >> scan. If you find some words that Kurzweil doesn't know, you may want
to
> >> add them to your word list so they won't be flagged in future scans. I
> >> don't do this for proper names unless its a name that will keep
cropping
> >> up in future books. I do add words that are valid but that Kurzweil
> >> doesn't have in its internal word list. You'll find that doing this
over
> >> time helps Kurzweil do a better job for you when you're cleaning up
your
> >> scans.
> >>
> >> 9. Keep the de-speckle setting turned off for most books. You may need
it
> >> with hardcover books because they sometimes have a text decoration on
the
> >> pages. Otherwise, de-speckle can interfere with OCR and actually cause
> >> more errors than it solves.
> >>
> >> 10. The issue of using auto-corrections when scanning is another issue
> >> where there is debate. I believe it can be a good thing if used
> >> carefully.
> >> I should note that Gerald has pointed out that Openbook has some
> >> auto-corrections that cause problems with books and should be fixed by
> >> users of that program. Kurzweil seems to do a good job for me, and it
> >> makes my work easier. I loaded up a bunch of my older scans that have
> >> been
> >> lurking on my hard rive for over a decade and ran auto-correction on
> >> them.
> >> What an improvement! I might actually get to submit some of them now.
> >> Here
> >> are a few auto-corrections I have added to my Kurzweil list.
> >>
> >> dirough for through
> >> diough for though
> >> diought for thought
> >> diey for they
> >> diere for there
> >> dieir for their
> >> cornpany for company
> >> cornfortable for comfortable
> >> tiiing for thing
> >> rnany for many
> >> anydiing for anything
> >>
> >>
> >> If you use Openbook, you may want to remove a few of the corrections in
> >> its default list. I regularly find these in books scanned in Openbook
and
> >> have to fix them as I read.
> >>
> >> modem for modern
> >> torn for tom
> >> glock for clock
> >> morn for mom
> >> bum for burn
> >> corn for com
> >>
> >> That last one causes problems for anyone scanning Star Trek books
because
> >> Kirk presses his corn badge to talk to the ship. (grin) If a word like
> >> command is hyphenated between two pages, you get corn-mand. Meanwhile,
> >> Batman dials into the internet with his modern, tries to stop a crook
> >> named torn from shooting him with a clock, and puts the dirty burn in
> >> cuffs until mom-ing. See how auto-corrections can go wrong if you're
not
> >> careful?
> >>
> >> Whew! We've made it to the end. (grin) I hope some of this makes your
> >> scans easier to work with. It'll give you a foundation to start from
> >> anyhow. Clean-up tips will be another email and will take some thought.
> >> I'm better at doing than explaining things. I do have a system I use
> >> though. I just haven't really written it down. Anyone got a cold Dr.
> >> Pepper to share?
> >>
> >> --
> >> Monica Willyard
> >>
> >>
> >>
> >> To unsubscribe from this list send a blank Email to
> >> bksvol-discuss-request@xxxxxxxxxxxxx
> >> put the word 'unsubscribe' by itself in the subject line.  To get a
list
> >> of available commands, put the word 'help' by itself in the subject
line.
> >>
> >
> > To unsubscribe from this list send a blank Email to
> > bksvol-discuss-request@xxxxxxxxxxxxx
> > put the word 'unsubscribe' by itself in the subject line.  To get a list
> > of available commands, put the word 'help' by itself in the subject
line.
> >
>
>  To unsubscribe from this list send a blank Email to
> bksvol-discuss-request@xxxxxxxxxxxxx
> put the word 'unsubscribe' by itself in the subject line.  To get a list
of available commands, put the word 'help' by itself in the subject line.
>
>
> __________ Information from ESET NOD32 Antivirus, version of virus
signature database 3360 (20080815) __________
>
> The message was checked by ESET NOD32 Antivirus.
>
> http://www.eset.com
>
>

 To unsubscribe from this list send a blank Email to
bksvol-discuss-request@xxxxxxxxxxxxx
put the word 'unsubscribe' by itself in the subject line.  To get a list of 
available commands, put the word 'help' by itself in the subject line.

Other related posts: