Hello, Here is a message from Jim Fruchterman which was sent to the volunteer list. Lisa brought my attention some of the discussion about PDF happening on one of the Bookshare.org lists. I just gave a presentation with Adobe at CSUN, and have posted the presentation below. The short story is that we should be able to turn tagged PDF into DAISY very easily. And, that untagged PDF is still a big problem. Some of the books we're getting from publishers are coming in tagged PDF, so this is an attractive approach for us. Jim Accessible PDF to DAISY/NIMAS Conversion Jim Fruchterman, Benetech Andres Gonzalez, Mike Wirth, Adobe Systems Inc. March, 2007 * Bookshare and the PDF/XML Need * Adobe's Acrobat Accessibility Work * Technology Demonstration I. Bookshare and the PDF/XML Need Bookshare.org A library of digital text * 31,000 accessible XML Books * 95% come from volunteer scanning * Repository is growing at a rate of 400-500 books a month * Increasing numbers coming directly from publishers and authors * Better quality than scanned * International permissions * Format conversion challenge DAISY format Digital Audio-based Information System * The DAISY XML standard is our core format * Books are read on a computer using synthetic speech * NISO/DAISY 3.0 XML specification enables text-based navigation, such as page numbers, and paragraphs * Think of it as a web page (HTML) plus a couple of extra tags (page numbers, chapters) * NIMAS (the new K-12 accessible textbook standard in the U.S.) is based on DAISY Need to convert PDF to XML Transforming visual to accessible * Most publishers are able to create PDF of their books * Goal is a smooth transformation from accessible PDF to DAISY * Turning PDF books and documents into highly accessible DAISY PDF Structure and Accessibility Tagged PDF * Became part of the PDF specs in Acrobat 5 motivated by: * eBooks * Accessibility * Adds document structure and logical order to PDF: * Pages, paragraphs, tables * Reading order * Can be semantically rich * Preserves PDF visual fidelity and portability Creating Tagged PDF Tagging PDF * MakeAccessible * Automatically adds tags to an existing untagged PDF * TouchUp * Allows authors to add and correct tagging * Accessibility checker * Checks for common tagging problems and provides suggestions how to fix them * PDFMaker * Creates tagged PDF from other authoring applications. Tagged PDF to DAISY Conversion PDDOM * XML DOM-like representation of PDF * Provides programmatic access to the tag structure of the PDF file * Cornerstone for Acrobat's * Assistive technologies support * PDF conversion to XML Tagged PDF to DAISY Conversion Acrobat SaveAsXML relies on PDDOM * Scriptable XML parsing engine to produce different types of XML-based outputs * Had to be extended to produce DAISY * To include page numbers * Layout and formatting information * XML post-processing of SaveAsXML output produces DAISY Demonstration of Current Prototype Technology Demo of a real eBook from a major publisher * Starting point: A tagged PDF novel * Taliesin by Stephen Lawhead * Using Acrobat, Save As XML * We have modified to save DAISY tags * Converts PDF tags into equivalent XML * Next, need to add key metadata * For DAISY, four main fields * Final, create DAISY ebook * Show it in gh Player Conclusion * PDF to XML Technology works well on most tagged PDF documents * Needs to be wrapped into a user-friendly package * Web-based service for schools and qualified users * Tool for Acrobat users * Need to complete work and testing * Complex books (textbooks) will still take human intervention to create fully compliant NIMAS * Existence of this capability should serve to drive increased creation of accessible PDF Find out More Adobe Accessibility Bookshare.org www.adobe.com/accessibility <http://www.adobe.com/accessibility> www.bookshare.org <http://www.bookshare.org/> Mike Wirth, Lisa Friendly mwirth@xxxxxxxxx <mailto:mwirth@xxxxxxxxx> 650-644-3420 Lisa.f@xxxxxxxxxxxx <mailto:Lisa.f@xxxxxxxxxxxx> Andres Gonzalez Jim Fruchterman andgonza@xxxxxxxxx <mailto:andgonza@xxxxxxxxx> jim@xxxxxxxxxxxx <mailto:jim@xxxxxxxxxxxx>