[bksvol-discuss] Re: linebreaks where not wanted

  • From: "Gerald Hovas" <GeraldHovas@xxxxxxxxxxx>
  • To: <bksvol-discuss@xxxxxxxxxxxxx>
  • Date: Mon, 12 Jun 2006 09:15:18 -0500

E,

I'm not sure, but I think some, and possibly most, of it is  due to not
having pages on a flatbed straight enough for the OCR to recognize the lines
properly.  Of course, it's also possible that the OCR software is having a
bit of trouble at consistently handling the spacing between lines in some
books.  I've also caught Fine Reader adding a line break before a number in
quite a few instances, so there may be a bug in it that needs fixing, at
least in version 7.  Usually rescanning the page will fix the problem when I
run across a page in one of my scans which has a lot of broken lines, so I
think that it's mainly due to having the page skewed a bit more than the OCR
software can tolerate.  Note that I said rescan, not rerecognize, since
rerecognizing the page using K-1000 can't change the amount of skew in the
image.

BTW, I wrote a tip a couple of months back which is helpful for searching
for these types of problems using Word.  Usually 10 extra minutes of
searching can find all, or nearly all,  of the instances where this happens
and allow you to fix them.  Maybe you can adapt it to K-1000, but I think
this is one instance where Word is the better tool.

If someone continuously has this issue with their scans, then I'd suggest
they make sure that they're using good practices for scanning, such as
keeping the surface of their scanner clean, holding the book down flat when
scanning, and making sure the book is straight when scanning.  It's my
experience that this can go a long way to improving scans.  But of course
you knew that.  Although, perhaps some of the newer volunteers didn't.

HTH

Gerald

-----Original Message-----
From: bksvol-discuss-bounce@xxxxxxxxxxxxx
[mailto:bksvol-discuss-bounce@xxxxxxxxxxxxx] On Behalf Of E.
Sent: Monday, June 12, 2006 4:11 AM
To: bksvol-discuss@xxxxxxxxxxxxx
Subject: [bksvol-discuss] linebreaks where not wanted

Speaking of linebreaks, I note a lot of books have one of the following
issues.
   Linebreak in the middle of a word.
linebreak hypen and space in the middle of a word
linebreak in the middle of a sentence

Anybody know of situations which may be a set up for this?  For example, 
are certain software scanning packages prone to these issues?  Anybody know 
settings to lower such things from occurring?

E.

 To unsubscribe from this list send a blank Email to
bksvol-discuss-request@xxxxxxxxxxxxx
put the word 'unsubscribe' by itself in the subject line.  To get a list of
available commands, put the word 'help' by itself in the subject line.

 To unsubscribe from this list send a blank Email to
bksvol-discuss-request@xxxxxxxxxxxxx
put the word 'unsubscribe' by itself in the subject line.  To get a list of 
available commands, put the word 'help' by itself in the subject line.

Other related posts: