[bookshare-discuss] FW: [bksvol-discuss] Bookshare.org and PDF

  • From: "John Glass" <John.G@xxxxxxxxxxxx>
  • To: <bookshare-discuss@xxxxxxxxxxxxx>
  • Date: Mon, 26 Mar 2007 16:52:22 -0700

 Hello,

 

Here is a message from Jim Fruchterman which was sent to the volunteer
list.

 

 Lisa brought my attention some of the discussion about PDF happening on
one of the Bookshare.org lists.  I just gave a presentation with Adobe
at CSUN, and have posted the presentation below.  

 

The short story is that we should be able to turn tagged PDF into DAISY
very easily.  And, that untagged PDF is still a big problem.  Some of
the books we're getting from publishers are coming in tagged PDF, so
this is an attractive approach for us.  

 

Jim

 


Accessible PDF to DAISY/NIMAS Conversion 


Jim Fruchterman, Benetech


Andres Gonzalez, Mike Wirth, Adobe Systems Inc.


March, 2007


 


*            Bookshare and the PDF/XML Need 


*            Adobe's Acrobat Accessibility Work


*            Technology Demonstration


 


I. Bookshare and the PDF/XML Need


Bookshare.org
          A library of digital text


*   31,000 accessible XML Books


*   95% come from volunteer scanning


*     Repository is growing at a rate of 400-500 books a month


*   Increasing numbers coming directly from publishers and authors


*     Better quality than scanned


*     International permissions


*     Format conversion challenge


 


DAISY format
          Digital Audio-based Information System


*    The DAISY XML standard is our core format


*    Books are read on a computer using synthetic speech


*    NISO/DAISY 3.0 XML specification enables text-based navigation,
such as page numbers, and paragraphs


*    Think of it as a web page (HTML) plus a couple of extra tags (page
numbers, chapters)


*    NIMAS (the new K-12 accessible textbook standard in the U.S.) is
based on DAISY


 


 


 


Need to convert PDF to XML
          Transforming visual to accessible


*   Most publishers are able to create PDF of their books


*   Goal is a smooth transformation from accessible PDF to DAISY


*    Turning PDF books and documents into highly accessible DAISY


 


 


 


 


 


 


 


 


PDF Structure and Accessibility
         Tagged PDF


*    Became part of the PDF specs in Acrobat 5 motivated by:


*     eBooks


*     Accessibility


*    Adds document structure and logical order to PDF:


*     Pages, paragraphs, tables


*     Reading order


*    Can be semantically rich


*    Preserves PDF visual fidelity and portability


Creating Tagged PDF
            Tagging PDF


*   MakeAccessible 


*     Automatically adds tags to an existing untagged PDF


*   TouchUp


*     Allows authors to add and correct tagging


*   Accessibility checker


*     Checks for common tagging problems and provides suggestions how to
fix them


*   PDFMaker


*     Creates tagged PDF from other authoring applications.


 


Tagged PDF to DAISY Conversion
            PDDOM


*   XML DOM-like representation of PDF


*   Provides programmatic access to the tag structure of the PDF file


*   Cornerstone for Acrobat's


*     Assistive technologies support


*     PDF conversion to XML


Tagged PDF to DAISY Conversion
            Acrobat SaveAsXML relies on PDDOM




*   Scriptable XML parsing engine to produce different types of
XML-based outputs 


*   Had to be extended to produce DAISY 


*     To include page numbers


*      Layout and formatting information


*     XML post-processing of SaveAsXML output produces DAISY 


 


Demonstration of Current Prototype Technology
                Demo of a real eBook from a major publisher


*    Starting point: A tagged PDF novel


*     Taliesin by Stephen Lawhead


*    Using Acrobat, Save As XML


*     We have modified to save DAISY tags


*     Converts PDF tags into equivalent XML


*    Next, need to add key metadata


*     For DAISY, four main fields


*    Final, create DAISY ebook


*    Show it in gh Player


Conclusion
               


*     PDF to XML Technology works well on most tagged PDF documents


*     Needs to be wrapped into a user-friendly package


*     Web-based service for schools and qualified users


*     Tool for Acrobat users


*     Need to complete work and testing


*     Complex books (textbooks) will still take human intervention to
create fully compliant NIMAS


*     Existence of this capability should serve to drive increased
creation of accessible PDF


Find out More


Adobe Accessibility   Bookshare.org


www.adobe.com/accessibility <http://www.adobe.com/accessibility>
www.bookshare.org <http://www.bookshare.org/> 


 


Mike Wirth,       Lisa Friendly 


mwirth@xxxxxxxxx <mailto:mwirth@xxxxxxxxx>         650-644-3420


               Lisa.f@xxxxxxxxxxxx <mailto:Lisa.f@xxxxxxxxxxxx> 


 


Andres Gonzalez              Jim Fruchterman


andgonza@xxxxxxxxx <mailto:andgonza@xxxxxxxxx>     jim@xxxxxxxxxxxx
<mailto:jim@xxxxxxxxxxxx>  


 


 


 


 


 


 

Other related posts:

  • » [bookshare-discuss] FW: [bksvol-discuss] Bookshare.org and PDF