[bksvol-discuss] BRF and page numbers - Bookshare.org Friday Update

  • From: "Janice Carter" <Janice.C@xxxxxxxxxxxx>
  • To: <bksvol-discuss@xxxxxxxxxxxxx>
  • Date: Fri, 2 Sep 2005 17:17:18 -0700

Based on lots of discussion on the list regarding the problems of page
numbers not appearing in Grade II BRF files downloaded from
Bookshare.org, The Benetech engineering team has been working on some
short-term as well as longer-term solutions.   The following is a fairly
detailed explanation from Dave Offen, Benetech's Director of
Engineering.

"...Recently we have been in frequent contact with Duxbury, the folks
who make our Grade II translator, to see if we can introduce into our
books a special "new page" code that the Duxbury translator will output
in Braille along with the original page number.  The folks at Duxbury
have told us which code we should use, and we have been experimenting
with it.  At first it didn't work at all, until we discovered that if
the new page occurs within a paragraph (because the paragraph continues
on the next page) the page number and new-page mark will be ignored by
Duxbury.  Now that we better understand the Duxbury requirements, we
should be able to reformat any open HTML tags in the vicinity of
new-page marks (such as the open paragraph tags) and get the Braille to
properly output page numbers. 
 
In DAISY 3 book readers, you can always ask "where am I" and it will
tell you your current page.  With the above mentioned change to our
Braille generation, people downloading BRF files will have access to the
same page information that people downloading DAISY 3 books now have.
It may take a few months before we've got all the kinks ironed out of
this process, but we understand that lots of people are waiting for this
kind of improvement.
 
For the longer term, we are looking into ways of improving the page
number identification in our books.  This is especially important for
textbook users.  We're investigating if there are scanning or
proofreading guidelines that can improve our ability to capture page
numbers.  This page number capturing takes place in the header/footer
stripper.  The header/footer stripper is needed to make the books flow
smoothly when listening to them using TTS in a DAISY 3 reader.   If the
page header or footer is located in the first line before or after a
page break in the OCR'd RTF file that gets uploaded to our collection,
the stripper will usually be able to extract the page number information
before it strips away the header/footer, and this information is stored
in our master XML file from which both DAISY 3 and BRF books are
generated.
 
As we begin to work with Publishers producing NIMAS content under the
new guidelines, these improvements to our BRF processing will carry
forward to our new NIMAS books as well.  We will be able to take NIMAS
files and using these same processes feed them through Duxbury to
produce BRF files with the original pages marked in Braille."

As we've mentioned in several other postings, changes to the
Bookshare.org system are no longer small efforts.  We will have 25,000
books very soon and changes and upgrades that will help Bookshare.org
grow are getting fully vetted by Engineering and Operations and
Fundraising and Jim and you.  
(The "when will this happen?" is based on funding timing.)  

Thanks again for keeping us focused on your needs.  
Stay safe this weekend.  

Janice Carter
Director, Literacy Programs
 
Benetech 
480 S. California Ave., Suite 201
Palo Alto, CA 94306-1609 USA
 
(650) 475-5440 x122
(650) 759-5828 cell
(650) 475-1066 fax
 
janice.c@xxxxxxxxxxxx
www.benetech.org
 
The Benetech Initiative - Technology Serving Humanity 
A Nonprofit Organization


Other related posts: