[guispeak] Announcing PDF2HTM

  • From: Jamal Mazrui <empower@xxxxxxxxx>
  • To: ProgrammingBlind@xxxxxxxxxxxxx, Program-L@xxxxxxxxxxxxx, GUISpeak@xxxxxxxxxxxxx
  • Date: Sun, 25 Jan 2009 13:49:35 -0500 (EST)

From the archive

Version 1.0
January 25, 2009
Copyright 2009 by Jamal Mazrui
GPL License

PDF2HTM is a command-line utility that converts one or more files from PDF
to HTML format.  The syntax is
pdf2htm.exe SourcePDF
where the parameter is either a file name or a wildcard spec like
Enclose it with quotes if it contains a space.  A resulting HTML file has
the same name except for a .htm extension.

This was built with Python 2.5 and the packages PDFMiner and py2exe.  The
top-level script, pdf2htm.py, is an adaptation of the PDFMiner tool called
pdf2txt.py.  The batch file, RunSetup.bat, runs the py2exe script,
setup.py, to create the stand-alone executable, pdf2htm.exe.

All aspects of the HTML format are determined by underlying PDFMinor
routines.  Visual aspects such as fonts are present, but structural
aspects such as headings do not seem to be converted, unfortunately.
Other programmers interested in this project may wish to work on improving
HTML structure.

** To leave the list, click on the immediately-following link:-
** [mailto:guispeak-request@xxxxxxxxxxxxx?subject=unsubscribe]
** If this link doesn't work then send a message to:
** guispeak-request@xxxxxxxxxxxxx
** and in the Subject line type
** unsubscribe
** For other list commands such as vacation mode, click on the
** immediately-following link:-
** [mailto:guispeak-request@xxxxxxxxxxxxx?subject=faq]
** or send a message, to
** guispeak-request@xxxxxxxxxxxxx with the Subject:- faq

Other related posts:

  • » [guispeak] Announcing PDF2HTM - Jamal Mazrui