[bksvol-discuss] Re: junk characters

  • From: "Monica Cortada" <mcortada@xxxxxxxxxxx>
  • To: <bksvol-discuss@xxxxxxxxxxxxx>
  • Date: Thu, 25 Jan 2007 06:57:27 -0500

I also use FineReader, but with different settings. With paperbacks I'll set
the resolution at 600.  If I'm using the automatic document feeder, I'll do
everthing at 600 if I'm not in a hurry.  Also, when you start getting blobs
on pages in the same place, it's probably time to clean the glass on the

Also, I let FineReader automatically set the brightness.  If the paper is
dark and you need more contrast, changing the document type from black and
white to greyscale or color might help.  

Also, do you let it automatically Analyze Page Layout?  It sets the OCR area
very close to the print margins thereby eliminating a lot of junk
characters.  I also use the ABBYY interface instead of the TWAIN interface.

One trick I use to get rid of non-text junk like random vertical bars after
sending the document to Word is to save it as .rtf, open it in Kurzweil
1000, then save it as .rtf in Kurzweil.  When I open it in Word again, the
junk graphic elements are all gone because Kurzweil will strip the graphics
while preserving the text formatting.  

FineReader does a few other things when going to Word that enhance the
visual layout of documents but are not needed for us.  For example, extra
white space after something like a chapter number is controlled by "Space
Before" in the Paragraph section of the Format menu.  Headings, page
numbers, and text that's set apart from the main body, are often set up as
separate columns and/or section breaks. Changing or eliminating columns can
result in really garbled text. Many editing changes in Word are only
automatically applied to the current "section" so a problem you just fixed
by selecting the whole document, mysteriously appears again.  

None of these oddities are difficult to fix once the cause is puzzled out.
I'm not sure there isn't a  fairy living in the OCR engine though because
it's recognition is like magic. 

M in M (Monica in Maryland)

--- Begin Message ---
  • From: "GenePoole" <captinlogic@xxxxxxxxx>
  • To: <bksvol-discuss@xxxxxxxxxxxxx>
  • Date: Wed, 24 Jan 2007 02:14:09 -0500
Is there a decent way of eliminating or greatly reducing junk characters
during a scan? It seems no matter how flat the book is on the scanner, I
always get 1's and i's and j's and brackets in the oddest places. And big
chunks of white space at the ends of lines. Ideas? Thanks. Oh, I'm using
finereader 8 with 300 resolutionand manual brightness adjustment.

--- End Message ---

