Re: PDF generated by LaTeX

  • From: "black ares" <matematicianu2003@xxxxxxxxxxx>
  • To: <programmingblind@xxxxxxxxxxxxx>
  • Date: Sat, 6 Nov 2010 16:09:31 +0200

use a good ocr program, like abbyy fine reader.

----- Original Message ----- From: "QuentinC" <webmaster@xxxxxxxxxxxx>
To: <programmingblind@xxxxxxxxxxxxx>
Sent: Saturday, November 06, 2010 4:10 PM
Subject: PDF generated by LaTeX


Hello everybody,
I know this is not really programming, but since this list is followed by blinds, programmers and possibly students (perhaps all in the same time), I think that somebody may ahve the ultimate solution or at least some tricks.

Many of my professors at my school, many documents on the web, and now even more and more electronic books, use LaTeX to generate beautiful PDF documents. Infortunately, I didn't found any good solution how to read them comfortably.

First of all, I hate Adobe Reader program because it's very very slow, buggy and not fine with jaws I have the habit to always convert PDF opened in it into text by the adoc Adobe Reader file menu command, so that the second time I want to read my document, I just have a text file which is much smaller, faster, find command is properly working, my computer doesn't take minutes to respond, etc. ADobe reader is really not a good program if you are not reading a fully accessible PDF (and I don't saw often really accessibles PDFs, in fact only one from a web accessibility certification company from france)

All not so complicated PDFs even if not accessible in WCAG sens are normally not so badly converted to text. However, it does not work for PDF generated by LaTeX. Both in Adobe reader and in the converted text file, reading is horrible. There is only one or two words by line, many words are often broken in the middle, there are page numbers, code snippets and math formulas in the middle of plain text or not where they should be, some sentances are mixed up, etc. The worst is with beamer slides, where letters are grouped in two or three but are completely mixed up and down without forming any really readable word.

I also tried the pdf2txt program available on one of the mailing-list grabag sites. IT's is a little better, but there is still broken words, headings aren't recognizeables and are even not in a line on their own, and there are many things not at their place.

Does anyone has tricks to read PDF generated by LaTeX better ? To convert them into comfortably readable text ? Or even better, convert them into a comfortable readable format which keep text semantics, i.e. possibility to navigate through headings, links, etc.

Bonus question which can turn into debat if you wish to : why a strong-semantic oriented format like LaTeX is unable to make accessible PDFs containing the same semantic ? I think it really should, because basicly LaTeX is like HTML or XML minus syntax differences, but semantic basics are identical. Does adobe ahve to be blamed for blocking/protecting/preventing something in that area ?

Thank you for your answers and have a nice week-end.

__________
View the list's information and change your settings at //www.freelists.org/list/programmingblind


__________
View the list's information and change your settings at //www.freelists.org/list/programmingblind

Other related posts: