[bksvol-discuss] Re: leaving headers

  • From: "Gerald Hovas" <geraldhovas@xxxxxxxxxxx>
  • To: <bksvol-discuss@xxxxxxxxxxxxx>
  • Date: Fri, 4 Nov 2005 17:25:21 -0600

Jill,

Bookshare is currently working on preserving the original page numbers in
the BRF files.  They've had to ask Duxbury for some guidance then do some
experimenting to make sure they understand Duxbury's advice.  They think
they have a better understanding of what they need to do now and are working
to update the tools to fix the problem as one of their engineering projects.
The following message was sent out in September and says that they will
probably need a few months to get the kinks out before we start seeing any
change.

As for guidance on how to handle running headers and footers, it sounds like
what you're doing is fine.  Marissa, who left Bookshare back at the end of
May, told me that manually stripping the text was fine since that's what the
Stripper will attempt to do anyway.  That makes it easier for the Stripper
to recognize the page numbers.  Just be sure the page number is the first
line of text if it's in the header or the last if it's in the footer or the
Stripper can't recognize it.  Keep in mind that this is why the Stripper is
really part of Bookshare's set of tools, not to strip headers and footers,
but to process the page numbers to make books easier to navigate with DAISY.
The list has brought up the issue of just how well that's working at the
moment since the page numbers are not showing up in K-1000 as expected.  But
from what Stephen Baum from Kurzweil has said, it sounds like it's just one
of those growing pains that we'll have to endure while Bookshare's tools and
the software packages that support DAISY mature.  My guess is that it's just
a temporary problem which will be worked out in the near future.

There's also the issue of preventing chapter headings from being stripped.
Moving a page number from the bottom of the page to the top of a page which
has a chapter heading when pages numbers appear in running headers will
prevent this from happening, and in the case where all of the page numbers
are in the footers, placing a line with the title of the book above the
chapter heading should solve the problem.  This gives the Stripper a header
to strip, and it leaves the chapter heading alone.

You'll see Dave mentioning NIMAS in the following message as he explains the
original page numbers in BRF issue.  When Engineering upgrades the tools to
support NIMAS, they'll attempt to address the issues volunteers are having
with the Stripper.  In the mean time, we'll just have to be patient.  That
doesn't mean we can't ask for some documentation for handling the Stripper
as it exists today, though, so I'll pass your request on to the staff.  I'm
sure you realize that this won't be the first time they will have gotten
this particular request.  That doesn't mean that they wouldn't like to
provide the help, just that they have their hands full with everything they
have to do and squeezing it in isn't easy.

If nothing else, sometime in the next few weeks I'll try to pull together
some of the e-mails that have been sent out over the list concerning how to
handle the Stripper and add a tip to Jake's website.

HTH

Gerald
-----Original Message-----

From: bksvol-discuss-bounce@xxxxxxxxxxxxx

[mailto:bksvol-discuss-bounce@xxxxxxxxxxxxx]On Behalf Of Janice Carter

Sent: Friday, September 02, 2005 7:17 PM

To: bksvol-discuss@xxxxxxxxxxxxx

Subject: [bksvol-discuss] BRF and page numbers - Bookshare.org Friday

Update





Based on lots of discussion on the list regarding the problems of page

numbers not appearing in Grade II BRF files downloaded from

Bookshare.org, The Benetech engineering team has been working on some

short-term as well as longer-term solutions. The following is a fairly

detailed explanation from Dave Offen, Benetech's Director of

Engineering.

"...Recently we have been in frequent contact with Duxbury, the folks

who make our Grade II translator, to see if we can introduce into our

books a special "new page" code that the Duxbury translator will output

in Braille along with the original page number. The folks at Duxbury

have told us which code we should use, and we have been experimenting

with it. At first it didn't work at all, until we discovered that if

the new page occurs within a paragraph (because the paragraph continues

on the next page) the page number and new-page mark will be ignored by

Duxbury. Now that we better understand the Duxbury requirements, we

should be able to reformat any open HTML tags in the vicinity of

new-page marks (such as the open paragraph tags) and get the Braille to

properly output page numbers.


In DAISY 3 book readers, you can always ask "where am I" and it will

tell you your current page. With the above mentioned change to our

Braille generation, people downloading BRF files will have access to the

same page information that people downloading DAISY 3 books now have.

It may take a few months before we've got all the kinks ironed out of

this process, but we understand that lots of people are waiting for this

kind of improvement.


For the longer term, we are looking into ways of improving the page

number identification in our books. This is especially important for

textbook users. We're investigating if there are scanning or

proofreading guidelines that can improve our ability to capture page

numbers. This page number capturing takes place in the header/footer

stripper. The header/footer stripper is needed to make the books flow

smoothly when listening to them using TTS in a DAISY 3 reader. If the

page header or footer is located in the first line before or after a

page break in the OCR'd RTF file that gets uploaded to our collection,

the stripper will usually be able to extract the page number information

before it strips away the header/footer, and this information is stored

in our master XML file from which both DAISY 3 and BRF books are

generated.


As we begin to work with Publishers producing NIMAS content under the

new guidelines, these improvements to our BRF processing will carry

forward to our new NIMAS books as well. We will be able to take NIMAS

files and using these same processes feed them through Duxbury to

produce BRF files with the original pages marked in Braille."

As we've mentioned in several other postings, changes to the

Bookshare.org system are no longer small efforts. We will have 25,000

books very soon and changes and upgrades that will help Bookshare.org

grow are getting fully vetted by Engineering and Operations and

Fundraising and Jim and you.

(The "when will this happen?" is based on funding timing.)

Thanks again for keeping us focused on your needs.

Stay safe this weekend.

Janice Carter

Director, Literacy Programs


Benetech

480 S. California Ave., Suite 201

Palo Alto, CA 94306-1609 USA


(650) 475-5440 x122

(650) 759-5828 cell

(650) 475-1066 fax


janice.c@xxxxxxxxxxxx

www.benetech.org


The Benetech Initiative - Technology Serving Humanity

A Nonprofit Organization




-----Original Message-----
From: bksvol-discuss-bounce@xxxxxxxxxxxxx
[mailto:bksvol-discuss-bounce@xxxxxxxxxxxxx]On Behalf Of Jill O'Connell
Sent: Friday, November 04, 2005 3:23 PM
To: bksvol-discuss@xxxxxxxxxxxxx
Subject: [bksvol-discuss] leaving headers


I would like to know what those of you who are submitting books are doing
about headers. I am a braille reader so don't know how they come across in
daisy format other than what I have read here on the list about tags. They
are certainly unpleasant to keep reading in braille, but the stripper seems
so unreliable that I don't feel we can count on it to remove them. Since we
keep begging Bookshare to give us guidance in this matter and they do not, I
am wondering what most of you do and also how you think it affects the
acceptance of your submissions. My present policy is to remove headers,
preserving the print page number, but as others have pointed out, the
numbers don't seem to be preserved in braille regardless.

Other related posts: