[bksvol-discuss] Re: Automatic Stripper problem

  • From: "Jesse Fahnestock" <Jesse.F@xxxxxxxxxxxx>
  • To: <bksvol-discuss@xxxxxxxxxxxxx>
  • Date: Thu, 26 Aug 2004 08:24:11 -0700

Hi all, as I see there is some new conversation about normalizing headers and 
footers, I thought I would repost the guidelines for doing so. Please remember 
that you are not required to do this. But this is how to do it right if you 
want to take it on!

---Begin instructions for headers and footers--

Volunteers can assist this tool by "normalizing" headers, footers, and 
page numbers in submitted files where they do not appear consistent. 
Normalizing such a headers/footers helps but it needs to be a 
complete job, as normalizing just a few headers could skew the 
probability of properly recognizing them throughout the book. If you 
wish to undertake this task, please be sure to:

1) Check line position of text (the first paragraph on a given page 
should be the header, the last should be the footer)
2) Check that page numbers should have a space on either side, 
separating them from the header/footer text. If the page number is 
the first character in a header it does not need a space before it; or if 
it is the last character in a footer it does not need a space after it.
3) Only change text in the header or footer in order to make it look 
like all other headers/footers
4) Perform 1-3 on every page.

Remember that the automated tool is designed to be effective on most 
scanned books so that you should undertake this "normalization" 
process only if you are sure that the headers and footers in the book 
you are validating are inconsistent and if you are able to normalize all 
of them throughout the book.

--end instructions--

jesse.

________________________

Jesse Fahnestock
Collection Development Coordinator, Bookshare.org
www.bookshare.org

A Project of The Benetech Initiative - Technology Serving Humanity
480 S. California Ave., Suite 201
Palo Alto, CA 94306-1609  USA
(650)475-5440 x133
(650) 475-1066 FAX
jesse@xxxxxxxxxxxx
www.benetech.org  

-----Original Message-----
From: bksvol-discuss-bounce@xxxxxxxxxxxxx
[mailto:bksvol-discuss-bounce@xxxxxxxxxxxxx]On Behalf Of Jake
Sent: den 26 augusti 2004 15:12
To: bksvol-discuss@xxxxxxxxxxxxx
Subject: [bksvol-discuss] Re: Automatic Stripper problem


My guess is that is part of the issue. Many of the books I scan/plan to scan
have page numbers at the top of the page except on pages where a new chapter
begins, in that case the page numbers are located at the bottom of the page.
So my guess is since the word Chapter is found first and on several pages
that the program thinks it is a heading and therefore throws it out the
window.
I'm sure if the program bookshare is using was written by them to add code
to skip the word chapter as a heading, but if not then I'd seriously
recommend finding a new program that does what we want, not what we don't.

Jake
----- Original Message ----- 
From: "Kyrath. (AKA Rob)" <kyrath@xxxxxxx>
To: <bksvol-discuss@xxxxxxxxxxxxx>
Sent: Thursday, August 26, 2004 7:48 AM
Subject: [bksvol-discuss] Re: Automatic Stripper problem


> Given the aggressive nature of the stripper, what I now intend to do is
put
> in the actual page number 2 lines above the chapter heading, assuming that
> page numbers are on top.  In theory, this should prevent the stripper from
> getting her greedy little hands on the chapter headings.  *grin*
> However, I wonder how the stripper treats headings in books that have page
> numbers at the bottom of the page?
> -- Rob
>
> ----- Original Message ----- 
> From: "Jake" <jabrown@xxxxxxxxx>
> To: <bksvol-discuss@xxxxxxxxxxxxx>
> Sent: Wednesday, August 25, 2004 11:02 PM
> Subject: [bksvol-discuss] Re: Automatic Stripper problem
>
>
> > Yes, I recently discovered that the auto stripper pretty much destroyed
my
> > first accepted submission.
> > I understand the reason for getting rid of the headers, but when it gets
> rid
> > of critical information like Chapter zzz or something, sometimes it is
> hard
> > to realize that  you are in fact in a new chapter (I have also noticed
> this
> > with books I've downloaded).
> >
> > While going back and fixing the messed up titles would be a long and
> > tiresome, not to mention cumbersome process, I believe that we need to
get
> > this problem resolved so that all new submissions are of a better
quality.
> >
> > So, would it be a good idea for me to strip the headers in books before
I
> > submit them now?
> >
> > Thanks,
> > Jake Brownell
> > ----- Original Message ----- 
> > From: <socly@xxxxxxxxx>
> > To: <bksvol-discuss@xxxxxxxxxxxxx>
> > Sent: Wednesday, August 25, 2004 9:23 PM
> > Subject: [bksvol-discuss] Re: Automatic Stripper problem
> >
> >
> > > I, too, strip my headers before submitting or uploading -- but what
you
> > say about the chapter headings worries me. Is this a new problem? I've
> been
> > putting the
> > > page number on the first line, then skipping a couple of lines before
> the
> > Chapter heading, be it Chapter and a number or an actual title.  I hope
> they
> > haven't been
> > > stripped.  And everyone wants page numbers (if you can't read the book
> at
> > one sitting, even when you're reading to children, how do you know where
> you
> > left off?
> > > Of what if they want you to go back to a particular page?  I do hope
> what
> > you found, Dilsia, was an aberration. Maybe Jesse can clear it up for us
> > (and publish
> > > another list of books being worked on or awaiting approval.)
> > >
> > > Cindy
> > >
> > >
> > >
> > >
> > > ----- Original Message -----
> > > From: Pam Quinn <quinns@xxxxxxxxxxxxx>
> > > Date: Wed, 25 Aug 2004 21:11:49 -0500
> > > To: bksvol-discuss@xxxxxxxxxxxxx
> > > Subject: [bksvol-discuss] Re: Automatic Stripper problem
> > >
> > > > I agree. I manually strip my own headers now before submitting, and
> > > > even if everybody didn't do this, I'd rather see the headers left in
> > > > than to lose information that the automatic stripper takes out. They
> > > > just don't work the way that they should. Oh boy; here we go,
talking
> > > > about strippers again.
> > > >
> > > > Pam
> > > >
> > > >
> > > > On Wed, 25 Aug 2004 19:55:51 -0400, you wrote:
> > > >
> > > > >Hi List:
> > > > >
> > > > >One of my books was accepted today. I downloaded the book to find
out
> > if the chapter headings were stripped. I had skipped a couple of blank
> lines
> > before
> > > each chapter number. Sure enough, all chapter headings are gone as
well
> as
> > other important headings. Apparently the trick of skipping a couple
lines
> > before each
> > > chapter heading is not working any more, if it ever did. Does the
> > automatic stripper always have to be applied? Personally I always strip
> > headers of books that I
> > > submit or validate. Another book that I validated all the numbers were
> > stripped. The page numbers are important for this particular book
because
> > it's a choose
> > > your own adventure which tells you to turn to certain pages at
different
> > points in the story. I find it very annoying that even the chapter
> headings
> > are stripped. I can
> > > understand the titles being stripped.  In my humble opinion, I rather
> have
> > the page numbers be left in. It gives me an idea how far I am into the
> book.
> > But at least
> > > the chapter heading
> > > >  s should
> > > > >definitely be preserved. Any suggestions on how to preserve the
> chapter
> > headings?
> > > > >
> > > > >*****
> > > > >Grace
> > > > >
> > > > >MSN: gcpires@xxxxxxxxxxx
> > > >
> > > >
> > > -- 
> > > _______________________________________________
> > > Find what you are looking for with the Lycos Yellow Pages
> > >
> >
>
http://r.lycos.com/r/yp_emailfooter/http://yellowpages.lycos.com/default.asp?SRC=lycos10
> > >
> > >
> >
> >
> >
>
>
>



Other related posts: