[bksvol-discuss] Re: Automatic Stripper problem

  • From: "Pratik Patel" <pratikp1@xxxxxxxxx>
  • To: <bksvol-discuss@xxxxxxxxxxxxx>
  • Date: Mon, 30 Aug 2004 03:44:01 -0400

Hello All,

Here is What I suspect has been happening with the header situation.

First, let me assure all of you that the automatic stripper only looks at
the first and the last line on the page.  We were assured of this fact when
this discussion arose the last time.  As a result, nothing that does not
appear on the first or the last line of the page will be removed.  To make
sure that chapter headings such as beginnings of new chapters are preserved,
they should be placed on the second line of the page.  We are further
assured that if all headers/footers are consistent, the chapter headings
will not be remoed as they do not fall into the typical header pattern.
But, to save myself from the whims of this type of analysis, I generally
make it a habit to put the chapter headings on the second line.

In this case, I actually suspect that the validator may have removed the
headings that were refered to.  

To alay Debra's concerns, you must make sure that the page number appears
either on the first or the last line of the page.  We are further assured by
Bookshare that if the page number is is placed this way, there is no needd
to it to be preceeded by or followed by a space.  The automatic tools will
recognize it.  If you have no additional header/footer info on a particular
line, the page number is used by Bookshare's automated conversion tools to
assign actual page numbers in the DAISY files.

Pratik

Pratik Patel
Managing Director
CUNYAssistive Technology Services
The City University of New York
     ppatel@xxxxxx
 
-----Original Message-----
From: bksvol-discuss-bounce@xxxxxxxxxxxxx
[mailto:bksvol-discuss-bounce@xxxxxxxxxxxxx] On Behalf Of Deborah Kent Stein
Sent: Sunday, August 29, 2004 3:50 PM
To: bksvol-discuss@xxxxxxxxxxxxx
Subject: [bksvol-discuss] Re: Automatic Stripper problem



To clarify,
The instructions say that page numbers should have a space on either side.
I was under the impression that they should have a line feed on either side.
Yikes!  Have I been doing it wrong all these years?
Debbie

----- Original Message -----
From: "Jesse Fahnestock" <Jesse.F@xxxxxxxxxxxx>
To: <bksvol-discuss@xxxxxxxxxxxxx>
Sent: Thursday, August 26, 2004 10:24 AM
Subject: [bksvol-discuss] Re: Automatic Stripper problem


> Hi all, as I see there is some new conversation about normalizing headers
and footers, I thought I would repost the guidelines for doing so. Please
remember that you are not required to do this. But this is how to do it
right if you want to take it on!
>
> ---Begin instructions for headers and footers--
>
> Volunteers can assist this tool by "normalizing" headers, footers, and
> page numbers in submitted files where they do not appear consistent.
> Normalizing such a headers/footers helps but it needs to be a
> complete job, as normalizing just a few headers could skew the
> probability of properly recognizing them throughout the book. If you
> wish to undertake this task, please be sure to:
>
> 1) Check line position of text (the first paragraph on a given page
> should be the header, the last should be the footer)
> 2) Check that page numbers should have a space on either side,
> separating them from the header/footer text. If the page number is
> the first character in a header it does not need a space before it; or if
> it is the last character in a footer it does not need a space after it.
> 3) Only change text in the header or footer in order to make it look
> like all other headers/footers
> 4) Perform 1-3 on every page.
>
> Remember that the automated tool is designed to be effective on most
> scanned books so that you should undertake this "normalization"
> process only if you are sure that the headers and footers in the book
> you are validating are inconsistent and if you are able to normalize all
> of them throughout the book.
>
> --end instructions--
>
> jesse.
>
> ________________________
>
> Jesse Fahnestock
> Collection Development Coordinator, Bookshare.org
> www.bookshare.org
>
> A Project of The Benetech Initiative - Technology Serving Humanity
> 480 S. California Ave., Suite 201
> Palo Alto, CA 94306-1609  USA
> (650)475-5440 x133
> (650) 475-1066 FAX
> jesse@xxxxxxxxxxxx
> www.benetech.org
>
> -----Original Message-----
> From: bksvol-discuss-bounce@xxxxxxxxxxxxx
> [mailto:bksvol-discuss-bounce@xxxxxxxxxxxxx]On Behalf Of Jake
> Sent: den 26 augusti 2004 15:12
> To: bksvol-discuss@xxxxxxxxxxxxx
> Subject: [bksvol-discuss] Re: Automatic Stripper problem
>
>
> My guess is that is part of the issue. Many of the books I scan/plan to
scan
> have page numbers at the top of the page except on pages where a new
chapter
> begins, in that case the page numbers are located at the bottom of the
page.
> So my guess is since the word Chapter is found first and on several pages
> that the program thinks it is a heading and therefore throws it out the
> window.
> I'm sure if the program bookshare is using was written by them to add code
> to skip the word chapter as a heading, but if not then I'd seriously
> recommend finding a new program that does what we want, not what we don't.
>
> Jake
> ----- Original Message -----
> From: "Kyrath. (AKA Rob)" <kyrath@xxxxxxx>
> To: <bksvol-discuss@xxxxxxxxxxxxx>
> Sent: Thursday, August 26, 2004 7:48 AM
> Subject: [bksvol-discuss] Re: Automatic Stripper problem
>
>
> > Given the aggressive nature of the stripper, what I now intend to do is
> put
> > in the actual page number 2 lines above the chapter heading, assuming
that
> > page numbers are on top.  In theory, this should prevent the stripper
from
> > getting her greedy little hands on the chapter headings.  *grin*
> > However, I wonder how the stripper treats headings in books that have
page
> > numbers at the bottom of the page?
> > -- Rob
> >
> > ----- Original Message -----
> > From: "Jake" <jabrown@xxxxxxxxx>
> > To: <bksvol-discuss@xxxxxxxxxxxxx>
> > Sent: Wednesday, August 25, 2004 11:02 PM
> > Subject: [bksvol-discuss] Re: Automatic Stripper problem
> >
> >
> > > Yes, I recently discovered that the auto stripper pretty much
destroyed
> my
> > > first accepted submission.
> > > I understand the reason for getting rid of the headers, but when it
gets
> > rid
> > > of critical information like Chapter zzz or something, sometimes it is
> > hard
> > > to realize that  you are in fact in a new chapter (I have also noticed
> > this
> > > with books I've downloaded).
> > >
> > > While going back and fixing the messed up titles would be a long and
> > > tiresome, not to mention cumbersome process, I believe that we need to
> get
> > > this problem resolved so that all new submissions are of a better
> quality.
> > >
> > > So, would it be a good idea for me to strip the headers in books
before
> I
> > > submit them now?
> > >
> > > Thanks,
> > > Jake Brownell
> > > ----- Original Message -----
> > > From: <socly@xxxxxxxxx>
> > > To: <bksvol-discuss@xxxxxxxxxxxxx>
> > > Sent: Wednesday, August 25, 2004 9:23 PM
> > > Subject: [bksvol-discuss] Re: Automatic Stripper problem
> > >
> > >
> > > > I, too, strip my headers before submitting or uploading -- but what
> you
> > > say about the chapter headings worries me. Is this a new problem? I've
> > been
> > > putting the
> > > > page number on the first line, then skipping a couple of lines
before
> > the
> > > Chapter heading, be it Chapter and a number or an actual title.  I
hope
> > they
> > > haven't been
> > > > stripped.  And everyone wants page numbers (if you can't read the
book
> > at
> > > one sitting, even when you're reading to children, how do you know
where
> > you
> > > left off?
> > > > Of what if they want you to go back to a particular page?  I do hope
> > what
> > > you found, Dilsia, was an aberration. Maybe Jesse can clear it up for
us
> > > (and publish
> > > > another list of books being worked on or awaiting approval.)
> > > >
> > > > Cindy
> > > >
> > > >
> > > >
> > > >
> > > > ----- Original Message -----
> > > > From: Pam Quinn <quinns@xxxxxxxxxxxxx>
> > > > Date: Wed, 25 Aug 2004 21:11:49 -0500
> > > > To: bksvol-discuss@xxxxxxxxxxxxx
> > > > Subject: [bksvol-discuss] Re: Automatic Stripper problem
> > > >
> > > > > I agree. I manually strip my own headers now before submitting,
and
> > > > > even if everybody didn't do this, I'd rather see the headers left
in
> > > > > than to lose information that the automatic stripper takes out.
They
> > > > > just don't work the way that they should. Oh boy; here we go,
> talking
> > > > > about strippers again.
> > > > >
> > > > > Pam
> > > > >
> > > > >
> > > > > On Wed, 25 Aug 2004 19:55:51 -0400, you wrote:
> > > > >
> > > > > >Hi List:
> > > > > >
> > > > > >One of my books was accepted today. I downloaded the book to find
> out
> > > if the chapter headings were stripped. I had skipped a couple of blank
> > lines
> > > before
> > > > each chapter number. Sure enough, all chapter headings are gone as
> well
> > as
> > > other important headings. Apparently the trick of skipping a couple
> lines
> > > before each
> > > > chapter heading is not working any more, if it ever did. Does the
> > > automatic stripper always have to be applied? Personally I always
strip
> > > headers of books that I
> > > > submit or validate. Another book that I validated all the numbers
were
> > > stripped. The page numbers are important for this particular book
> because
> > > it's a choose
> > > > your own adventure which tells you to turn to certain pages at
> different
> > > points in the story. I find it very annoying that even the chapter
> > headings
> > > are stripped. I can
> > > > understand the titles being stripped.  In my humble opinion, I
rather
> > have
> > > the page numbers be left in. It gives me an idea how far I am into the
> > book.
> > > But at least
> > > > the chapter heading
> > > > >  s should
> > > > > >definitely be preserved. Any suggestions on how to preserve the
> > chapter
> > > headings?
> > > > > >
> > > > > >*****
> > > > > >Grace
> > > > > >
> > > > > >MSN: gcpires@xxxxxxxxxxx
> > > > >
> > > > >
> > > > --
> > > > _______________________________________________
> > > > Find what you are looking for with the Lycos Yellow Pages
> > > >
> > >
> >
>
http://r.lycos.com/r/yp_emailfooter/http://yellowpages.lycos.com/default.asp
?SRC=lycos10
> > > >
> > > >
> > >
> > >
> > >
> >
> >
> >
>
>
>




Other related posts: