[bksvol-discuss] Re: what does the stripper do as part of the bookshare processing software

  • From: Barbara <barbarab65@xxxxxxxxxxxxx>
  • To: bksvol-discuss@xxxxxxxxxxxxx
  • Date: Thu, 26 Apr 2007 20:42:14 -0700 (PDT)

Does it matter whether the page numbers are at the top or the bottom of pages? 
Which is preferred?
   
  Barbara

Gerald Hovas <GeraldHovas@xxxxxxxxx> wrote: 
  The purpose of the Stripper is not to strip headers and footers, but to
process page numbers in order to provide better navigation in DAISY books.
In order to find the page numbers, the Stripper must identify running
headers and footers, and Engineering decided that as long as they had
identified the headers and footers, they might as well strip the text in
them to make the book cleaner. Engineering made the mistake of naming the
tool the Stripper, though, so the name doesn't refer to the tool's primary
task, which is confusing to volunteers.

The problem with the Stripper is that unless the headers and footers are
identical down to the whitespace, the Stripper can't recognize them, so it's
best that you strip the text in the headers and footers manually to help the
Stripper identify the page numbers, as well as insure a cleaner book.

A more annoying problem with the Stripper is that it has a tendency to strip
chapter headings. This may be due to an attempt to process them as well to
provide navigation at the chapter level, since Jim once mentioned that
Bookshare had attempted to do that when they wrote the Stripper. If so,
then they made the mistake of stripping the text from the scan like they are
doing the headers and footers. The Stripper may be just getting the chapter
headings confused with a header, though, and stripping them.

BTW, the page numbers which the Stripper finds are now being passed on to
the Braille translator to provide page numbering which matches the print
book, so BRF books can also be affected now if the Stripper can't identify
the page numbers.

HTH

Gerald

-----Original Message-----
From: bksvol-discuss-bounce@xxxxxxxxxxxxx
[mailto:bksvol-discuss-bounce@xxxxxxxxxxxxx] On Behalf Of Tracy Carcione
Sent: Thursday, April 26, 2007 8:15 AM
To: bksvol-discuss@xxxxxxxxxxxxx
Subject: [bksvol-discuss] Re: what does the stripper do as part of the
bookshare processing software

What the infamous stripper does is strip recurring text at the top or
bottom of pages. Its purpose is to get rid of recurring titles. It does
not strip the page number.
My understanding is that it looks for a line at the top of the page, with
a blank line after it. Also, the text must be exactly recurring--if it
has differing spacing, for instance, it is not stripped.

I'm sick to death of getting books where the titles are still in the book,
which happens a lot, because they're not exactly the same. So, when I
validate, I strip the titles myself, being careful to leave the page
number at the top, or bottom, separated from the text by a blank line.

The stripper has also been known to strip other text at the top of a page,
like chapter titles. Thus, many people put the page number before the
chapter title, at the top, with a blank line after it, and/or delete any
blank lines after the chapter title, in hopes of not triggering the
stripper.

How do we hate the stripper; let me count the ways.
Tracy

> Hello All:
> What does the stripper do as part of the bookshare processing software?
> Lisa.
>
> To unsubscribe from this list send a blank Email to
> bksvol-discuss-request@xxxxxxxxxxxxx
> put the word 'unsubscribe' by itself in the subject line. To get a list
> of available commands, put the word 'help' by itself in the subject line.
>
>


To unsubscribe from this list send a blank Email to
bksvol-discuss-request@xxxxxxxxxxxxx
put the word 'unsubscribe' by itself in the subject line. To get a list of
available commands, put the word 'help' by itself in the subject line.

To unsubscribe from this list send a blank Email to
bksvol-discuss-request@xxxxxxxxxxxxx
put the word 'unsubscribe' by itself in the subject line. To get a list of 
available commands, put the word 'help' by itself in the subject line.


Other related posts: