[bksvol-discuss] Re: Scans with Spaces in the Middle of Words

  • From: "Tiffany H. Jessen" <tjessen@xxxxxxxxxxxxx>
  • To: bksvol-discuss@xxxxxxxxxxxxx
  • Date: Thu, 06 Jul 2006 12:09:48 -0400

I am not argueing with anyone, but don't forget though it doesn't look like they're on the end of lines now, they may in fact be on the print copy. There is a setting somewhere I can't think off hand which tells kurzweil to retain or ignore origional formatting.

----- Original Message ----- From: "Bud Schwab" <budschwab@xxxxxxxxxxx>
To: <bksvol-discuss@xxxxxxxxxxxxx>
Sent: Thursday, July 06, 2006 11:17 AM
Subject: [bksvol-discuss] Re: Scans with Spaces in the Middle of Words



I get quite a few of those breaks in the middle of a word and it's not necessarilly at the end of a line. The next time I find that I will do the same page over with each engine and see what happens. If it's not the engine I don't know what it could be.

Bud

At 07:54 AM 7/6/2006, you wrote:
One version of Kurzweil 1000 had, for a while, a problem when using ScanSoft as the recognition engine that sounds like it would produce the kind of defects you are describing. When an end of line hyphen was removed, a space was left in its place. This was fixed quite a while ago in a patch.

Stephen

At 04:15 PM 7/2/2006, you wrote:
I've seen problems with spaces inside of words before. From what I was told, the submitter was using ScanSoft. Unless someone knows of a book with this problem that was not scanned using ScanSoft, then it's possible that it's a bug in that OCR software, or a bug in an earlier version if you happen to have a recent version which doesn't have the problem.

Whether it's related to a specific piece of OCR software or not, it's possible that it's due to the word being split at the end of a line in the printed book and the OCR software not recognizing the hyphen at the end of the first half of the word, so the word doesn't appear to be a word split by the end of a line.

HTH

Gerald

----------
From: bksvol-discuss-bounce@xxxxxxxxxxxxx [mailto:bksvol-discuss-bounce@xxxxxxxxxxxxx] On Behalf Of Bud Schwab
Sent: Sunday, July 02, 2006 10:07 AM
To: bksvol-discuss@xxxxxxxxxxxxx
Subject: [bksvol-discuss] Re: Scans with Spaces in the Middle of Words


I have the same trouble. Hope somebody comes up with a sollution or at least an explanation. I'll be watching.

BudAt 05:11 AM 7/2/2006, you wrote:

I'm validating a book now that has a random space in the middle of words, perhaps four or five times a page. The spell checker will catch most of these, but if the space appears in such a place that the characters before and after the space both form words, I'll never know.

I suspect there's no way around this but reading the book through, which I may not have time to do.

Any idea what causes the OCR to do something like that?

Just curious,

Lora



__________ NOD32 1.1616 (20060622) Information __________

This message was checked by NOD32 antivirus system.
<http://www.eset.com>http://www.eset.com

                                Bud Schwab              W 6 Z Y P
Malibu, California

To unsubscribe from this list send a blank Email to
bksvol-discuss-request@xxxxxxxxxxxxx
put the word 'unsubscribe' by itself in the subject line. To get a list of available commands, put the word 'help' by itself in the subject line.


__________ NOD32 1.1637 (20060702) Information __________

This message was checked by NOD32 antivirus system.
http://www.eset.com



Bud Schwab
W 6 Z Y P
Malibu, California
To unsubscribe from this list send a blank Email to
bksvol-discuss-request@xxxxxxxxxxxxx
put the word 'unsubscribe' by itself in the subject line. To get a list of available commands, put the word 'help' by itself in the subject line.



To unsubscribe from this list send a blank Email to bksvol-discuss-request@xxxxxxxxxxxxx put the word 'unsubscribe' by itself in the subject line. To get a list of available commands, put the word 'help' by itself in the subject line.

Other related posts: