[guispeak] PDF to TXT 3.0 released

  • From: Jamal Mazrui <empower@xxxxxxxxx>
  • To: guispeak@xxxxxxxxxxxxx, program-l@xxxxxxxxxxxxx, programmingblind@xxxxxxxxxxxxx
  • Date: Sun, 25 May 2008 10:09:45 -0400 (EDT)

Now available at

After a few years since version 2.1, I have now updated the program with
two substantive enhancements that broaden the range of PDFs from which
text can be obtained.  If a PDF is locked with a password that you know,
type it in the edit box that has been added to the main dialog.  If the
PDF is primarily an image format without textual characters, e.g., the
result of a scan, mark the new checkbox so that optical character
recognition (OCR) is performed rather than the usual text extraction
techniques.  Google Tesseract technology is used for this, which is
currently the best free OCR available.

Note that OCR should be used as a last resort, since it takes much longer
and is more error prone.  Essentially, PDF to TXT now incorporates the
PDF2OCR package, which has been available at
The download size of the new installer is much larger, about 22 megabytes,
in exchange for the additional OCR capability.

The program's batch conversion features work with the latest enhancements.
Thus, all the PDFs in a directory, or all those on a web page, may be
processed with a single command if they share the same password or image


** To leave the list, click on the immediately-following link:-
** [mailto:guispeak-request@xxxxxxxxxxxxx?subject=unsubscribe]
** If this link doesn't work then send a message to:
** guispeak-request@xxxxxxxxxxxxx
** and in the Subject line type
** unsubscribe
** For other list commands such as vacation mode, click on the
** immediately-following link:-
** [mailto:guispeak-request@xxxxxxxxxxxxx?subject=faq]
** or send a message, to
** guispeak-request@xxxxxxxxxxxxx with the Subject:- faq

Other related posts:

  • » [guispeak] PDF to TXT 3.0 released