[bksvol-discuss] Re: Automatic Stripper problem

  • From: "Kyrath. (AKA Rob)" <kyrath@xxxxxxx>
  • To: <bksvol-discuss@xxxxxxxxxxxxx>
  • Date: Mon, 30 Aug 2004 09:12:19 -0400

Hi Pratik,

In theory, this should be the case.  However, I have seen several books that
I have either submitted or validated that weren't treated accurately,
according to these guidelines.

I wonder how often the word "chapter" plus a number, located on the first
line of a page, would be necessary in order to trigger the automatic tool
into considering it a page heading.

Would a book that has only 12 chapters be less likely to get stripped than a
book with 42 chapters?
Or possibly, is it dependent upon a percentage of the number of total pages?

-- Rob
----- Original Message ----- 
From: "Pratik Patel" <pratikp1@xxxxxxxxx>
To: <bksvol-discuss@xxxxxxxxxxxxx>
Sent: Monday, August 30, 2004 3:44 AM
Subject: [bksvol-discuss] Re: Automatic Stripper problem


> Hello All,
>
> Here is What I suspect has been happening with the header situation.
>
> First, let me assure all of you that the automatic stripper only looks at
> the first and the last line on the page.  We were assured of this fact
when
> this discussion arose the last time.  As a result, nothing that does not
> appear on the first or the last line of the page will be removed.  To make
> sure that chapter headings such as beginnings of new chapters are
preserved,
> they should be placed on the second line of the page.  We are further
> assured that if all headers/footers are consistent, the chapter headings
> will not be remoed as they do not fall into the typical header pattern.
> But, to save myself from the whims of this type of analysis, I generally
> make it a habit to put the chapter headings on the second line.
>
> In this case, I actually suspect that the validator may have removed the
> headings that were refered to.
>
> To alay Debra's concerns, you must make sure that the page number appears
> either on the first or the last line of the page.  We are further assured
by
> Bookshare that if the page number is is placed this way, there is no needd
> to it to be preceeded by or followed by a space.  The automatic tools will
> recognize it.  If you have no additional header/footer info on a
particular
> line, the page number is used by Bookshare's automated conversion tools to
> assign actual page numbers in the DAISY files.
>
> Pratik
>
> Pratik Patel
> Managing Director
> CUNYAssistive Technology Services
> The City University of New York
>      ppatel@xxxxxx
>
> -----Original Message-----
> From: bksvol-discuss-bounce@xxxxxxxxxxxxx
> [mailto:bksvol-discuss-bounce@xxxxxxxxxxxxx] On Behalf Of Deborah Kent
Stein
> Sent: Sunday, August 29, 2004 3:50 PM
> To: bksvol-discuss@xxxxxxxxxxxxx
> Subject: [bksvol-discuss] Re: Automatic Stripper problem
>
>
>
> To clarify,
> The instructions say that page numbers should have a space on either side.
> I was under the impression that they should have a line feed on either
side.
> Yikes!  Have I been doing it wrong all these years?
> Debbie
>
> ----- Original Message -----
> From: "Jesse Fahnestock" <Jesse.F@xxxxxxxxxxxx>
> To: <bksvol-discuss@xxxxxxxxxxxxx>
> Sent: Thursday, August 26, 2004 10:24 AM
> Subject: [bksvol-discuss] Re: Automatic Stripper problem
>
>
> > Hi all, as I see there is some new conversation about normalizing
headers
> and footers, I thought I would repost the guidelines for doing so. Please
> remember that you are not required to do this. But this is how to do it
> right if you want to take it on!
> >
> > ---Begin instructions for headers and footers--
> >
> > Volunteers can assist this tool by "normalizing" headers, footers, and
> > page numbers in submitted files where they do not appear consistent.
> > Normalizing such a headers/footers helps but it needs to be a
> > complete job, as normalizing just a few headers could skew the
> > probability of properly recognizing them throughout the book. If you
> > wish to undertake this task, please be sure to:
> >
> > 1) Check line position of text (the first paragraph on a given page
> > should be the header, the last should be the footer)
> > 2) Check that page numbers should have a space on either side,
> > separating them from the header/footer text. If the page number is
> > the first character in a header it does not need a space before it; or
if
> > it is the last character in a footer it does not need a space after it.
> > 3) Only change text in the header or footer in order to make it look
> > like all other headers/footers
> > 4) Perform 1-3 on every page.
> >
> > Remember that the automated tool is designed to be effective on most
> > scanned books so that you should undertake this "normalization"
> > process only if you are sure that the headers and footers in the book
> > you are validating are inconsistent and if you are able to normalize all
> > of them throughout the book.
> >
> > --end instructions--
> >
> > jesse.
> >
> > ________________________
> >
> > Jesse Fahnestock
> > Collection Development Coordinator, Bookshare.org
> > www.bookshare.org
> >
> > A Project of The Benetech Initiative - Technology Serving Humanity
> > 480 S. California Ave., Suite 201
> > Palo Alto, CA 94306-1609  USA
> > (650)475-5440 x133
> > (650) 475-1066 FAX
> > jesse@xxxxxxxxxxxx
> > www.benetech.org
> >
> > -----Original Message-----
> > From: bksvol-discuss-bounce@xxxxxxxxxxxxx
> > [mailto:bksvol-discuss-bounce@xxxxxxxxxxxxx]On Behalf Of Jake
> > Sent: den 26 augusti 2004 15:12
> > To: bksvol-discuss@xxxxxxxxxxxxx
> > Subject: [bksvol-discuss] Re: Automatic Stripper problem
> >
> >
> > My guess is that is part of the issue. Many of the books I scan/plan to
> scan
> > have page numbers at the top of the page except on pages where a new
> chapter
> > begins, in that case the page numbers are located at the bottom of the
> page.
> > So my guess is since the word Chapter is found first and on several
pages
> > that the program thinks it is a heading and therefore throws it out the
> > window.
> > I'm sure if the program bookshare is using was written by them to add
code
> > to skip the word chapter as a heading, but if not then I'd seriously
> > recommend finding a new program that does what we want, not what we
don't.
> >
> > Jake
> > ----- Original Message -----
> > From: "Kyrath. (AKA Rob)" <kyrath@xxxxxxx>
> > To: <bksvol-discuss@xxxxxxxxxxxxx>
> > Sent: Thursday, August 26, 2004 7:48 AM
> > Subject: [bksvol-discuss] Re: Automatic Stripper problem
> >
> >
> > > Given the aggressive nature of the stripper, what I now intend to do
is
> > put
> > > in the actual page number 2 lines above the chapter heading, assuming
> that
> > > page numbers are on top.  In theory, this should prevent the stripper
> from
> > > getting her greedy little hands on the chapter headings.  *grin*
> > > However, I wonder how the stripper treats headings in books that have
> page
> > > numbers at the bottom of the page?
> > > -- Rob
> > >
> > > ----- Original Message -----
> > > From: "Jake" <jabrown@xxxxxxxxx>
> > > To: <bksvol-discuss@xxxxxxxxxxxxx>
> > > Sent: Wednesday, August 25, 2004 11:02 PM
> > > Subject: [bksvol-discuss] Re: Automatic Stripper problem
> > >
> > >
> > > > Yes, I recently discovered that the auto stripper pretty much
> destroyed
> > my
> > > > first accepted submission.
> > > > I understand the reason for getting rid of the headers, but when it
> gets
> > > rid
> > > > of critical information like Chapter zzz or something, sometimes it
is
> > > hard
> > > > to realize that  you are in fact in a new chapter (I have also
noticed
> > > this
> > > > with books I've downloaded).
> > > >
> > > > While going back and fixing the messed up titles would be a long and
> > > > tiresome, not to mention cumbersome process, I believe that we need
to
> > get
> > > > this problem resolved so that all new submissions are of a better
> > quality.
> > > >
> > > > So, would it be a good idea for me to strip the headers in books
> before
> > I
> > > > submit them now?
> > > >
> > > > Thanks,
> > > > Jake Brownell
> > > > ----- Original Message -----
> > > > From: <socly@xxxxxxxxx>
> > > > To: <bksvol-discuss@xxxxxxxxxxxxx>
> > > > Sent: Wednesday, August 25, 2004 9:23 PM
> > > > Subject: [bksvol-discuss] Re: Automatic Stripper problem
> > > >
> > > >
> > > > > I, too, strip my headers before submitting or uploading -- but
what
> > you
> > > > say about the chapter headings worries me. Is this a new problem?
I've
> > > been
> > > > putting the
> > > > > page number on the first line, then skipping a couple of lines
> before
> > > the
> > > > Chapter heading, be it Chapter and a number or an actual title.  I
> hope
> > > they
> > > > haven't been
> > > > > stripped.  And everyone wants page numbers (if you can't read the
> book
> > > at
> > > > one sitting, even when you're reading to children, how do you know
> where
> > > you
> > > > left off?
> > > > > Of what if they want you to go back to a particular page?  I do
hope
> > > what
> > > > you found, Dilsia, was an aberration. Maybe Jesse can clear it up
for
> us
> > > > (and publish
> > > > > another list of books being worked on or awaiting approval.)
> > > > >
> > > > > Cindy
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > ----- Original Message -----
> > > > > From: Pam Quinn <quinns@xxxxxxxxxxxxx>
> > > > > Date: Wed, 25 Aug 2004 21:11:49 -0500
> > > > > To: bksvol-discuss@xxxxxxxxxxxxx
> > > > > Subject: [bksvol-discuss] Re: Automatic Stripper problem
> > > > >
> > > > > > I agree. I manually strip my own headers now before submitting,
> and
> > > > > > even if everybody didn't do this, I'd rather see the headers
left
> in
> > > > > > than to lose information that the automatic stripper takes out.
> They
> > > > > > just don't work the way that they should. Oh boy; here we go,
> > talking
> > > > > > about strippers again.
> > > > > >
> > > > > > Pam
> > > > > >
> > > > > >
> > > > > > On Wed, 25 Aug 2004 19:55:51 -0400, you wrote:
> > > > > >
> > > > > > >Hi List:
> > > > > > >
> > > > > > >One of my books was accepted today. I downloaded the book to
find
> > out
> > > > if the chapter headings were stripped. I had skipped a couple of
blank
> > > lines
> > > > before
> > > > > each chapter number. Sure enough, all chapter headings are gone as
> > well
> > > as
> > > > other important headings. Apparently the trick of skipping a couple
> > lines
> > > > before each
> > > > > chapter heading is not working any more, if it ever did. Does the
> > > > automatic stripper always have to be applied? Personally I always
> strip
> > > > headers of books that I
> > > > > submit or validate. Another book that I validated all the numbers
> were
> > > > stripped. The page numbers are important for this particular book
> > because
> > > > it's a choose
> > > > > your own adventure which tells you to turn to certain pages at
> > different
> > > > points in the story. I find it very annoying that even the chapter
> > > headings
> > > > are stripped. I can
> > > > > understand the titles being stripped.  In my humble opinion, I
> rather
> > > have
> > > > the page numbers be left in. It gives me an idea how far I am into
the
> > > book.
> > > > But at least
> > > > > the chapter heading
> > > > > >  s should
> > > > > > >definitely be preserved. Any suggestions on how to preserve the
> > > chapter
> > > > headings?
> > > > > > >
> > > > > > >*****
> > > > > > >Grace
> > > > > > >
> > > > > > >MSN: gcpires@xxxxxxxxxxx
> > > > > >
> > > > > >
> > > > > --
> > > > > _______________________________________________
> > > > > Find what you are looking for with the Lycos Yellow Pages
> > > > >
> > > >
> > >
> >
>
http://r.lycos.com/r/yp_emailfooter/http://yellowpages.lycos.com/default.asp
> ?SRC=lycos10
> > > > >
> > > > >
> > > >
> > > >
> > > >
> > >
> > >
> > >
> >
> >
> >
>
>
>
>
>



Other related posts: