atw: Tip: Recognizing Text Using OCR in Acrobat (Was: Converting PDF to Word)

This is getting away from Christine's original query, but last week I was
given four PDFs that looked as if they had been created by a 'scanner to
PDF' process. They had come from old hardcopy manuals so each page in the
PDF was a single bitmap image. My colleague didn't need to make the
document editable but he did want to be able to search for text strings
using Acrobat Find.

I used the following procedure in Acrobat Standard 7.0 to create a
searchable copy of each manual:
1. Open the PDF in Acrobat.
2. Select Document > Recognize Text Using OCR > Start.
3. Select Pages: All pages.
   You can experiment with changing the conversion settings.
   (I didn't because the defaults seemed to work well enough.)
4. Click OK to start scanning.
   Wait till all the pages have been scanned.
5. Save the new searchable PDF under a new name.

Notes:
- The OCR engine appears to do a good job but you should warn
  readers that it may not be perfect. For example, you can test
  the conversion by searching for a word that you know is in the
  book, but you won't know if the Find missed another instance of
  that word because the OCR scanner jumbled one of the letters.
- A 64 page (4 MB) manual took about 6 minutes to scan.
- A 330 page (25 MB) manual took about 45 minutes to scan.
---
Stuart Burnfield
Information Developer
IBM Australia Development Laboratory (ADL), Perth

Phone: +61 8 9261 8719 (xtn 18719; Tie-Line: 701 8719)

Warren said:
> Yeah but the result is often a NIGHTMARE!
> I have had 40 to 100 pages of images saved as a Word document. All
> the converter did was turn pages into jpg's.

**************************************************
To post a message to austechwriter, send the message to 
austechwriter@xxxxxxxxxxxxxx

To subscribe to austechwriter, send a message to 
austechwriter-request@xxxxxxxxxxxxx with "subscribe" in the Subject field.

To unsubscribe, send a message to austechwriter-request@xxxxxxxxxxxxx with 
"unsubscribe" in the Subject field.

To search the austechwriter archives, go to 
www.freelists.org/archives/austechwriter

To contact the list administrator, send a message to 
austechwriter-admins@xxxxxxxxxxxxx
**************************************************

Other related posts: