In general, EdSharp does not do file conversions, itself; It calls a free, external converter, either an executable or a COM server. Converterters are associated with file extensions, and are defined in EdSharp.ini in the Import and Export sections. If only one converter for an extension is defined, EdSharp chooses it automatically. Otherwise, EdSharp prompts with a list of converters to choose from.
Currently, the only converter defined for the .pdf extension is the PdfToText utility available at
http://www.foolabs.com/xpdf/download.htmlThanks for making me aware of that weakness in its conversions. Unfortunately, I have not found a free PDF conversion utility that consistently outperforms all others. I find that this one tends to do better than others in terms of inferring proper reading order, but worse than others in terms of formatting of the text.
In terms of the PDF conversion features most important to you, you may wish to compare results with PDF2TXT, available at
http://EmpowermentZone.com/p2tsetup.exe Try both the regular conversion, and the Extra HTML option (Alt+X).FileDir uses yet another conversion utility called GetText, which converts other formats to text as well as PDF.
http://EmpowermentZone.com/dirsetup.exe Jamal On 9/27/2010 1:21 PM, Alex Hall wrote:
Pretty sure. When I open the email to which the pdf is attached in gmail, I can download it or have google translate it to html for me. If I choose to have it translated, the resulting page looks how I expect, with all new lines where they should be. If I download the file and open it in edsharp, single new lines are gone. On 9/27/10, Homme, James<james.homme@xxxxxxxxxxxx> wrote:Hi, I'm not trying to defend EdSharp, but are you sure that the problem is not with the PDF? Thanks. Jim Jim Homme, Usability Services, Phone: 412-544-1810. Skype: jim.homme Internal recipients, Read my accessibility blog. Discuss accessibility here. Accessibility Wiki: Breaking news and accessibility advice -----Original Message----- From: programmingblind-bounce@xxxxxxxxxxxxx [mailto:programmingblind-bounce@xxxxxxxxxxxxx] On Behalf Of Alex Hall Sent: Monday, September 27, 2010 12:39 PM To: programmingblind Subject: edSharp pdf converter and new lines Hi all, mostly Jamal: I am wondering if new lines in pdf files could be handled better? When I convert a pdf with edsharp, any single return is turned into a space. If there are two hard returns or more in a row, they are preserved, but one return is lost. This can get frustrating with documents containing text to be parsed, questions to be answered, and other types of structured or semi-structured text that is not purely for reading, and a couple of my professors love pdf files so I get a lot of non reading ones. Thanks. -- Have a great day, Alex (msg sent from GMail website) mehgcap@xxxxxxxxx; http://www.facebook.com/mehgcap __________ View the list's information and change your settings at //www.freelists.org/list/programmingblind This e-mail and any attachments to it are confidential and are intended solely for use of the individual or entity to whom they are addressed. If you have received this e-mail in error, please notify the sender immediately and then delete it. If you are not the intended recipient, you must not keep, use, disclose, copy or distribute this e-mail without the author's prior permission. The views expressed in this e-mail message do not necessarily represent the views of Highmark Inc., its subsidiaries, or affiliates. __________ View the list's information and change your settings at //www.freelists.org/list/programmingblind
__________View the list's information and change your settings at //www.freelists.org/list/programmingblind