[bksvol-discuss] Re: Automatic Stripper problem

  • From: "Sarah Van Oosterwijck" <curiousentity@xxxxxxxxxxxxx>
  • To: <bksvol-discuss@xxxxxxxxxxxxx>
  • Date: Thu, 26 Aug 2004 11:50:26 -0500

Well, that works for getting rid of them, but the point is that it does
nothing for the preservation of things that are not headers, like the
chapter numbers and titles.  I don't understand why removing them is more
important than keeping the book intact, and I completely agree with Jake
that the stripper should be programmed to recognize the word chapter and
leave it and anything after it alone.  It would also be best if it left
roman numerals and all other numbers alone.  If not the page number they
rarely occur in headers, so that would not limit the stripping ability of
the program/script.

I have been normalizing my headers instead of removing them, and also
copying headers to the pages with chapter titles or numbers on them.  Of
course I add the correct page number to the header.  I am hoping that they
get eaten instead of the chapter titles, but nothing I have submitted has
been approved since I started this procedure, so I am not sure if it works.
I have to admit some impatience to find out. :-)
This problem is not an occasional one, it happens with almost every book, as
far as I can tell.

Sarah Van Oosterwijck
http://home.earthlink.net/~netentity

----- Original Message -----
From: "Jesse Fahnestock" <Jesse.F@xxxxxxxxxxxx>
To: <bksvol-discuss@xxxxxxxxxxxxx>
Sent: Thursday, August 26, 2004 10:24 AM
Subject: [bksvol-discuss] Re: Automatic Stripper problem


> Hi all, as I see there is some new conversation about normalizing headers
and footers, I thought I would repost the guidelines for doing so. Please
remember that you are not required to do this. But this is how to do it
right if you want to take it on!
>
> ---Begin instructions for headers and footers--
>
> Volunteers can assist this tool by "normalizing" headers, footers, and
> page numbers in submitted files where they do not appear consistent.
> Normalizing such a headers/footers helps but it needs to be a
> complete job, as normalizing just a few headers could skew the
> probability of properly recognizing them throughout the book. If you
> wish to undertake this task, please be sure to:
>
> 1) Check line position of text (the first paragraph on a given page
> should be the header, the last should be the footer)
> 2) Check that page numbers should have a space on either side,
> separating them from the header/footer text. If the page number is
> the first character in a header it does not need a space before it; or if
> it is the last character in a footer it does not need a space after it.
> 3) Only change text in the header or footer in order to make it look
> like all other headers/footers
> 4) Perform 1-3 on every page.
>
> Remember that the automated tool is designed to be effective on most
> scanned books so that you should undertake this "normalization"
> process only if you are sure that the headers and footers in the book
> you are validating are inconsistent and if you are able to normalize all
> of them throughout the book.
>
> --end instructions--
>
> jesse.
>
> ________________________
>
> Jesse Fahnestock
> Collection Development Coordinator, Bookshare.org
> www.bookshare.org
>
> A Project of The Benetech Initiative - Technology Serving Humanity
> 480 S. California Ave., Suite 201
> Palo Alto, CA 94306-1609  USA
> (650)475-5440 x133
> (650) 475-1066 FAX
> jesse@xxxxxxxxxxxx
> www.benetech.org
>
> -----Original Message-----
> From: bksvol-discuss-bounce@xxxxxxxxxxxxx
> [mailto:bksvol-discuss-bounce@xxxxxxxxxxxxx]On Behalf Of Jake
> Sent: den 26 augusti 2004 15:12
> To: bksvol-discuss@xxxxxxxxxxxxx
> Subject: [bksvol-discuss] Re: Automatic Stripper problem
>
>
> My guess is that is part of the issue. Many of the books I scan/plan to
scan
> have page numbers at the top of the page except on pages where a new
chapter
> begins, in that case the page numbers are located at the bottom of the
page.
> So my guess is since the word Chapter is found first and on several pages
> that the program thinks it is a heading and therefore throws it out the
> window.
> I'm sure if the program bookshare is using was written by them to add code
> to skip the word chapter as a heading, but if not then I'd seriously
> recommend finding a new program that does what we want, not what we don't.
>
> Jake
> ----- Original Message -----
> From: "Kyrath. (AKA Rob)" <kyrath@xxxxxxx>
> To: <bksvol-discuss@xxxxxxxxxxxxx>
> Sent: Thursday, August 26, 2004 7:48 AM
> Subject: [bksvol-discuss] Re: Automatic Stripper problem
>
>
> > Given the aggressive nature of the stripper, what I now intend to do is
> put
> > in the actual page number 2 lines above the chapter heading, assuming
that
> > page numbers are on top.  In theory, this should prevent the stripper
from
> > getting her greedy little hands on the chapter headings.  *grin*
> > However, I wonder how the stripper treats headings in books that have
page
> > numbers at the bottom of the page?
> > -- Rob
> >
> > ----- Original Message -----
> > From: "Jake" <jabrown@xxxxxxxxx>
> > To: <bksvol-discuss@xxxxxxxxxxxxx>
> > Sent: Wednesday, August 25, 2004 11:02 PM
> > Subject: [bksvol-discuss] Re: Automatic Stripper problem
> >
> >
> > > Yes, I recently discovered that the auto stripper pretty much
destroyed
> my
> > > first accepted submission.
> > > I understand the reason for getting rid of the headers, but when it
gets
> > rid
> > > of critical information like Chapter zzz or something, sometimes it is
> > hard
> > > to realize that  you are in fact in a new chapter (I have also noticed
> > this
> > > with books I've downloaded).
> > >
> > > While going back and fixing the messed up titles would be a long and
> > > tiresome, not to mention cumbersome process, I believe that we need to
> get
> > > this problem resolved so that all new submissions are of a better
> quality.
> > >
> > > So, would it be a good idea for me to strip the headers in books
before
> I
> > > submit them now?
> > >
> > > Thanks,
> > > Jake Brownell
> > > ----- Original Message -----
> > > From: <socly@xxxxxxxxx>
> > > To: <bksvol-discuss@xxxxxxxxxxxxx>
> > > Sent: Wednesday, August 25, 2004 9:23 PM
> > > Subject: [bksvol-discuss] Re: Automatic Stripper problem
> > >
> > >
> > > > I, too, strip my headers before submitting or uploading -- but what
> you
> > > say about the chapter headings worries me. Is this a new problem? I've
> > been
> > > putting the
> > > > page number on the first line, then skipping a couple of lines
before
> > the
> > > Chapter heading, be it Chapter and a number or an actual title.  I
hope
> > they
> > > haven't been
> > > > stripped.  And everyone wants page numbers (if you can't read the
book
> > at
> > > one sitting, even when you're reading to children, how do you know
where
> > you
> > > left off?
> > > > Of what if they want you to go back to a particular page?  I do hope
> > what
> > > you found, Dilsia, was an aberration. Maybe Jesse can clear it up for
us
> > > (and publish
> > > > another list of books being worked on or awaiting approval.)
> > > >
> > > > Cindy
> > > >
> > > >
> > > >
> > > >
> > > > ----- Original Message -----
> > > > From: Pam Quinn <quinns@xxxxxxxxxxxxx>
> > > > Date: Wed, 25 Aug 2004 21:11:49 -0500
> > > > To: bksvol-discuss@xxxxxxxxxxxxx
> > > > Subject: [bksvol-discuss] Re: Automatic Stripper problem
> > > >
> > > > > I agree. I manually strip my own headers now before submitting,
and
> > > > > even if everybody didn't do this, I'd rather see the headers left
in
> > > > > than to lose information that the automatic stripper takes out.
They
> > > > > just don't work the way that they should. Oh boy; here we go,
> talking
> > > > > about strippers again.
> > > > >
> > > > > Pam
> > > > >
> > > > >
> > > > > On Wed, 25 Aug 2004 19:55:51 -0400, you wrote:
> > > > >
> > > > > >Hi List:
> > > > > >
> > > > > >One of my books was accepted today. I downloaded the book to find
> out
> > > if the chapter headings were stripped. I had skipped a couple of blank
> > lines
> > > before
> > > > each chapter number. Sure enough, all chapter headings are gone as
> well
> > as
> > > other important headings. Apparently the trick of skipping a couple
> lines
> > > before each
> > > > chapter heading is not working any more, if it ever did. Does the
> > > automatic stripper always have to be applied? Personally I always
strip
> > > headers of books that I
> > > > submit or validate. Another book that I validated all the numbers
were
> > > stripped. The page numbers are important for this particular book
> because
> > > it's a choose
> > > > your own adventure which tells you to turn to certain pages at
> different
> > > points in the story. I find it very annoying that even the chapter
> > headings
> > > are stripped. I can
> > > > understand the titles being stripped.  In my humble opinion, I
rather
> > have
> > > the page numbers be left in. It gives me an idea how far I am into the
> > book.
> > > But at least
> > > > the chapter heading
> > > > >  s should
> > > > > >definitely be preserved. Any suggestions on how to preserve the
> > chapter
> > > headings?
> > > > > >
> > > > > >*****
> > > > > >Grace
> > > > > >
> > > > > >MSN: gcpires@xxxxxxxxxxx
> > > > >
> > > > >
> > > > --
> > > > _______________________________________________
> > > > Find what you are looking for with the Lycos Yellow Pages
> > > >
> > >
> >
>
http://r.lycos.com/r/yp_emailfooter/http://yellowpages.lycos.com/default.asp
?SRC=lycos10
> > > >
> > > >
> > >
> > >
> > >
> >
> >
> >
>
>
>


Other related posts: