[bksvol-discuss] Re: text quality

  • From: Guido Corona <guidoc@xxxxxxxxxx>
  • To: bksvol-discuss@xxxxxxxxxxxxx
  • Date: Thu, 29 Apr 2004 15:51:00 -0500

Yes,  the ranked spelling feature will yield a spelling accuracy rating 
similar to the error rate used by Bookshare.  The recognition stats simply 
indicate how tedious it was for  the OCR engine to complete the job.

Guido
 

Guido D. Corona
IBM Accessibility Center,  Austin Tx.
IBM Research,
Phone:  (512) 838-9735
Email: guidoc@xxxxxxxxxxx

Visit my weekly Accessibility WebLog at:
http://www-3.ibm.com/able/weblog/corona_weblog.html





"Pratik Patel" <pratikp1@xxxxxxxxx> 
Sent by: bksvol-discuss-bounce@xxxxxxxxxxxxx
04/29/2004 03:41 PM
Please respond to
bksvol-discuss


To
<bksvol-discuss@xxxxxxxxxxxxx>
cc

Subject
[bksvol-discuss] Re: text quality






Paul and Noel,
 
Since you've both expressed the same questions, let me ask something in 
return.  How do you characterize your results.  Do you generally gage the 
accuracy of your results via the self-reported recognition stats or by 
looking at amount of spelling mistakes per page when you change various 
settings.  The recognition stats presented by Kurzweil are often 
misleading.  What is given to us often done by getting it from the OCR 
engine in question.  With those types of self reports, there is always a 
matter of accuracy and reliability.  There is also a matter of validity. I 
have experimented with this issue a bit and have found that even with the 
same settings, if you keep on scanning a page a few times the supplied 
stats are different each time, in some cases by a large margin.  However, 
when this issue is looked at from the actual accuracy perspective, it's 
quite reliable to look at the page from different scans with same 
settings.  Even when the Optimize Scanning feature comes up with different 
settings for the same page when using that function several times, the 
accuracy is not effected too often.  The gray scale with 400DPI does make 
a large difference.  Even when I use optimize scanning, I make sure that 
at the end I compare results by using the Gray scale with the 400 DPI.
 
Pratik
 
 
Pratik Patel 
Managing Director 
CUNY Assistive Technology Services 
the City University of New York 
(718) 997-3775 
ppatel@xxxxxx 
-----Original Message-----
From: bksvol-discuss-bounce@xxxxxxxxxxxxx 
[mailto:bksvol-discuss-bounce@xxxxxxxxxxxxx] On Behalf Of Edwards, Paul
Sent: Thursday, April 29, 2004 4:25 PM
To: bksvol-discuss@xxxxxxxxxxxxx
Subject: [bksvol-discuss] Re: text quality

Actually, I knew that.  So of course, the question becomes: if it makes no 
difference, why do different values turn up when you do "optimize scan"? 
Perhaps there is a Kurzweil guru lurking.  Would you care to emerge from 
the lurk and answer the question?
 
Paul
 
 
Paul Edwards, Director
Access Services, North Campus
Phone: (305) 237-1146
Fax: (305-237-1831
TTY: (305) 237-1413
Email: pedwards@xxxxxxxx
home email: edwpaul@xxxxxxxxxxx 
-----Original Message-----
From: Guido Corona [mailto:guidoc@xxxxxxxxxx]
Sent: Thursday, April 29, 2004 4:02 PM
To: bksvol-discuss@xxxxxxxxxxxxx
Subject: [bksvol-discuss] Re: text quality


Paul,  I also use grayscale at 400 DPI most of the time with Kurzweil 8.0. 
 If you find it is rather slow,  scan images only,  then turn on pure 
recognition before going to sleep.  Your book will be ready when you wake 
in the morning,  no matter how large it is.  By the way,  with grayscale 
brightness makes no difference. 

Guido 


Guido D. Corona
IBM Accessibility Center,  Austin Tx.
IBM Research,
Phone:  (512) 838-9735
Email: guidoc@xxxxxxxxxxx

Visit my weekly Accessibility WebLog at:
http://www-3.ibm.com/able/weblog/corona_weblog.html




"Edwards, Paul" <pedwards@xxxxxxxx> 
Sent by: bksvol-discuss-bounce@xxxxxxxxxxxxx 
04/29/2004 02:42 PM 

Please respond to
bksvol-discuss



To
<bksvol-discuss@xxxxxxxxxxxxx> 
cc

Subject
[bksvol-discuss] Re: text quality








This is a difficult issue.  I take the approach of carefully checking the 
first few pages at the beginning of a scan.  If there are errors I can 
adjust for, I do that.  I also rescan pages whose value in Kurzweil comes 
back lower than ninety.  I do not tend to scan ninety to ninety-five 
because I can usually not make much of a difference and we are often 
dealing with a screwed over heading or something.

However, I scanned a book recently which was a hard cover and which should 
have scanned like a dream and came out as pure druck.

I have found that optimizing scanning is, for the most part, worth doing. 
The results do not always make me happy in that I am now scanning a book 
using gray scale and sixty which takes forever to scan.  By the way, it is 
legends two edited by Robert Silverberg.

Paul


Paul Edwards, Director
Access Services, North Campus
Phone: (305) 237-1146
Fax: (305-237-1831
TTY: (305) 237-1413
Email: pedwards@xxxxxxxx
home email: edwpaul@xxxxxxxxxxx

-----Original Message-----
From: Kellie Hartmann [mailto:kellhart@xxxxxxxxxx]
Sent: Thursday, April 29, 2004 1:04 AM
To: bksvol-discuss@xxxxxxxxxxxxx
Subject: [bksvol-discuss] text quality


Hi all.
Even with the wonderful new scanning software available there are a few
kinds of things that are very difficult to get a good scan from. For
example, linguistics books are often very graphical in nature and contain
symbols that the OCR packages don't recognize; things like r-underring and
turned V etc. Also some cheap paperbacks do have places where they seem to
be blurred. I scanned a novel that I was assigned to read in French class,
and when I found illegible passages I tried rescanning them. I rescanned
several times changing various settings, but certain passages absolutely
refused to scan. I don't really plan to submit it to Bookshare anyway, but 
I
would prefer this scan, with a couple of blurred lines every 20 pages or 
so,
to no scan. I'm able to use this in class with no problems, so in my 
opinion
this is far better than nothing. Finally, I have another French book which
has very glossy pages and lots of flashy graphical design. Again, even 
with
a lot of work on experimenting with different settings my results were not
encouraging. This I definitely won't submit to Bookshare because I can't 
get
it in good enough shape; the effort required would be far beyond the
benefits. I agree that careless scanning is unreasonable, and think that
validating is important. It always takes me much longer to validate
something than to scan it because I read the whole book and fix every 
error
that can possibly be fixed. Not every validator is going to do that, and
certain books, such as enormous textbooks, really would require a great
investment in time to proof thoroughly. So it isn't realistic to expect
every book to be flawless. What I would really like eventually, and I know
this isn't realistic either, would be to have all the fair-quality books
rescanned.
Kellie




Other related posts: