[bookshare-discuss] Re: Better Scans?

  • From: "Mary Otten" <maryotten@xxxxxxxxxxxxx>
  • To: "bookshare-discuss@xxxxxxxxxxxxx" <bookshare-discuss@xxxxxxxxxxxxx>
  • Date: Sun, 09 May 2004 21:48:32 -0500

Hi Allison,
Fine Engine is generally regarded as the better of the two ocr engines that are 
included with K1000. But there are times when the other engine, rtk, does 
better, because it doesn't put so many junk characters in. So if 
you get a book where you're finding that no matter what you do, you still get 
plenty of junk characters, feel free to experiment and change to rtk and see 
what that does. On a suite of varying documents, fine engine 
tests out better. but there are definitely times when rtk will prove a better 
choice. 
Ranked spelling is one of the options you will find in the tools menu of K1000. 
After you do a scan, you can use ranked spelling and see a list of what the 
program thinks are mistakes, and those which occur most 
frequently will show up first. It will also give you a percentage of "correct" 
words. So you might start out, say, with a percentage of words it thinks are 
correct at 99.38. And then you have this list of words that you can 
look through, and choices, such as ignore all or add to word list or replace 
all. It will offer you a suggested replacement, and you can accept or type your 
own. You can also see some context for the first occurrence of 
the word it thinks is an error. Sometimes, errors are perfectly correct words 
that just aren't in its dictionary. You can also go into the regular spell 
checker and set some things in there, like telling it to ignore words that 
start with upper case letters, if you want, or words with number in them, or 
words that are all uppre case. Save any changes you make in your settings 
files, and from now on, when you use ranked spelling or the spell 
checker, you won't see those types of words marked as errors. 
As for brightness settings, that really depends on the scanner you use. 
Settings that work well on a given book with my scanner might not do so well 
with yours, and vice versa. I tend to start with mass market 
paperbacks somewhere around 60 or a bit higher and then experiment and see if I 
need to adjust one way or the other, as print in those is often kind of dark. 
Scan a few pages from various portions of the book at a 
given setting and run ranked spelling against that, then see what sorts of 
errors you get. And adjust brightness up or down accordingly. Or you can use 
the scanning optimization wizzard which is one of the items in the 
"scan" menu. I have not found it to be particularly helpful. But I think I'm in 
the minority on that score. Aside from brightness, there is also scanner 
resolution which you can adjust, or threshholding settings of dynamic 
or gray scale which can be tried. I know Guido has said he often gets good 
results with gray scale and 400 dots per inch resolution, whereas I've found 
that those settings rarely if ever help. So you can see that it 
really depends on your particular scanner and the books or other materials you 
scan. 
But its funny how we get picker as this technology improves. I started with 
Open book back in 1994 or 1995. And I thought it was pretty amazing to be able 
to go and get readable output from pretty much any book I 
could fit on the scanner. Now, with much better ocr technology, I find myself 
being much more critical of things I scan. I guess we're never satisfied.
Mary



Other related posts: