Now available at http://www.EmpowermentZone.com/pdf2ocr.zip Following up on a tip from Ken Perry about the open source Tesseract project at Google, I have tried to use this OCR engine to build a free program for producing accessible text from an image-based PDF. Such files are created by scanning equipment or software printer drivers that save only the picture of text, without the actual characters themselves. This makes them inaccessible to most PDF viewing utilities, which extract text but do not perform OCR on images. I could not find an existing Windows solution on the web, but did get useful ideas from Linux-oriented ones. What I am calling PDF2OCR combines Tesseract from http://code.google.com/p/tesseract-ocr with the GhostScript interpreter from http://ghostscript.com GhostScript creates a .tif file from the .pdf file of interest, and then Tesseract creates a .txt file from that. The current implementation is a simple batch file, pdf2ocr.bat, with the following syntax on the command line: pdf2ocr SourceRootName where SourceRootName is the name of a PDF file without the .pdf extension. This produces a text file with the same name except for a .txt extension. The PDF name can include a directory path, but not embedded spaces. For example, pdf2ocr c:\temp\test produces c:\temp\test.txt I am seeking feedback on this initial test version. I want to be sure it works on computers that have not run the GhostScript installation program for Windows. The archive contains an image-based PDF for testing called debate.pdf (the legal agreement between the Bush and Kerry campaigns concerning Presidential debates). Please understand that Tesseract is not the best OCR available, though it is generally considered the best free OCR. Installation consists of unzipping the pdf2ocr.zip archive to a directory, e.g., to one called C:\PDF2OCR The target directory will contain many files that I gathered from subdirectories of an installed GhostScript directory tree. It will also contain one subdirectory called tessedata, which is required by the tesseeract.exe program for language support (I have only distributed English files, but other languages are available from the Google site). In order to run the batch file from any directory, you can add the PDF2OCR directory to the path of a console session with a command like the following: set path=c:\pdf2ocr;%path% You can add the path for every console session via the Advanced tab page of the System applet in Control Panel. The pdf2ocr.zip download is large, about 14 megs, so it will probably remain a stand-alone project, rather than being bundled with other applications I develop. Feel free to enhance it in the spirit of open source development! Jamal __________ View the list's information and change your settings at //www.freelists.org/list/programmingblind