[bksvol-discuss] Re: Scans with Spaces in the Middle of Words

  • From: Bud Schwab <budschwab@xxxxxxxxxxx>
  • To: bksvol-discuss@xxxxxxxxxxxxx
  • Date: Thu, 06 Jul 2006 11:07:45 -0700

Hi Steven,
That makes my head swim trying to understand that clearly. I'll play around with it and see what I come up with. I can also have my wife look at it to see where the spaces show up, whether in the middle of a line or at the end.
Thanks.


Bud
At 10:32 AM 7/6/2006, you wrote:
Hi Bud.

What do you mean by "end of a line"? The lines in the editor window of a Kurzweil 1000 do not correspond to the original lines of the scanned document, unless you set the Line Endings Setting to Respected. Its default value is "Ignored". Another approach is to set the reading unit to "line", and move forward or backward by unit with F8 (forward), F6 (backward), or reread a unit with (F7). The line specified by a reading unit is typically the original line of the scanned document.

Stephen

At 11:17 AM 7/6/2006, you wrote:
I get quite a few of those breaks in the middle of a word and it's not necessarilly at the end of a line. The next time I find that I will do the same page over with each engine and see what happens. If it's not the engine I don't know what it could be.

Bud

At 07:54 AM 7/6/2006, you wrote:
One version of Kurzweil 1000 had, for a while, a problem when using ScanSoft as the recognition engine that sounds like it would produce the kind of defects you are describing. When an end of line hyphen was removed, a space was left in its place. This was fixed quite a while ago in a patch.

Stephen

At 04:15 PM 7/2/2006, you wrote:
I've seen problems with spaces inside of words before. From what I was told, the submitter was using ScanSoft. Unless someone knows of a book with this problem that was not scanned using ScanSoft, then it's possible that it's a bug in that OCR software, or a bug in an earlier version if you happen to have a recent version which doesn't have the problem.

Whether it's related to a specific piece of OCR software or not, it's possible that it's due to the word being split at the end of a line in the printed book and the OCR software not recognizing the hyphen at the end of the first half of the word, so the word doesn't appear to be a word split by the end of a line.

HTH

Gerald

----------
From: bksvol-discuss-bounce@xxxxxxxxxxxxx [mailto:bksvol-discuss-bounce@xxxxxxxxxxxxx] On Behalf Of Bud Schwab
Sent: Sunday, July 02, 2006 10:07 AM
To: bksvol-discuss@xxxxxxxxxxxxx
Subject: [bksvol-discuss] Re: Scans with Spaces in the Middle of Words


I have the same trouble. Hope somebody comes up with a sollution or at least an explanation. I'll be watching.

BudAt 05:11 AM 7/2/2006, you wrote:

I'm validating a book now that has a random space in the middle of words, perhaps four or five times a page. The spell checker will catch most of these, but if the space appears in such a place that the characters before and after the space both form words, I'll never know.

I suspect there's no way around this but reading the book through, which I may not have time to do.

Any idea what causes the OCR to do something like that?

Just curious,

Lora



__________ NOD32 1.1616 (20060622) Information __________

This message was checked by NOD32 antivirus system.
<http://www.eset.com>http://www.eset.com

                                Bud Schwab              W 6 Z Y P
Malibu, California

To unsubscribe from this list send a blank Email to
bksvol-discuss-request@xxxxxxxxxxxxx
put the word 'unsubscribe' by itself in the subject line. To get a list of available commands, put the word 'help' by itself in the subject line.


__________ NOD32 1.1637 (20060702) Information __________

This message was checked by NOD32 antivirus system.
http://www.eset.com


Bud Schwab
W 6 Z Y P
Malibu, California

To unsubscribe from this list send a blank Email to
bksvol-discuss-request@xxxxxxxxxxxxx
put the word 'unsubscribe' by itself in the subject line. To get a list of available commands, put the word 'help' by itself in the subject line.

To unsubscribe from this list send a blank Email to
bksvol-discuss-request@xxxxxxxxxxxxx
put the word 'unsubscribe' by itself in the subject line. To get a list of available commands, put the word 'help' by itself in the subject line.


__________ NOD32 1.1637 (20060702) Information __________

This message was checked by NOD32 antivirus system.
http://www.eset.com



Bud Schwab
W 6 Z Y P
Malibu, California
To unsubscribe from this list send a blank Email to
bksvol-discuss-request@xxxxxxxxxxxxx
put the word 'unsubscribe' by itself in the subject line. To get a list of available commands, put the word 'help' by itself in the subject line.


Other related posts: