[bksvol-discuss] Re: Automatic Stripper problem

  • From: Cindy <popularplace@xxxxxxxxx>
  • To: bksvol-discuss@xxxxxxxxxxxxx
  • Date: Thu, 26 Aug 2004 10:18:41 -0700 (PDT)

Jesse,

The impression I get from the recent posts is that the
stripper seems to be stripping Chapter titles (be they
with numbers, e.g., Chapter 1, Chapter one,  or actual
names of chapters, e.g. The Lost Dog)., not just
chapter and title headings and page numbers -- even
when the chapter title is a couple of line spaces
down.

Is this a mistaken impression on my part? Or did
something happen to corrupt the stripper program when
bookshare re-tooled recently -- or perhaps before
that?

Cindy


--- Jesse Fahnestock <Jesse.F@xxxxxxxxxxxx> wrote:

> Hi all, as I see there is some new conversation
> about normalizing headers and footers, I thought I
> would repost the guidelines for doing so. Please
> remember that you are not required to do this. But
> this is how to do it right if you want to take it
> on!
> 
> ---Begin instructions for headers and footers--
> 
> Volunteers can assist this tool by "normalizing"
> headers, footers, and 
> page numbers in submitted files where they do not
> appear consistent. 
> Normalizing such a headers/footers helps but it
> needs to be a 
> complete job, as normalizing just a few headers
> could skew the 
> probability of properly recognizing them throughout
> the book. If you 
> wish to undertake this task, please be sure to:
> 
> 1) Check line position of text (the first paragraph
> on a given page 
> should be the header, the last should be the footer)
> 2) Check that page numbers should have a space on
> either side, 
> separating them from the header/footer text. If the
> page number is 
> the first character in a header it does not need a
> space before it; or if 
> it is the last character in a footer it does not
> need a space after it.
> 3) Only change text in the header or footer in order
> to make it look 
> like all other headers/footers
> 4) Perform 1-3 on every page.
> 
> Remember that the automated tool is designed to be
> effective on most 
> scanned books so that you should undertake this
> "normalization" 
> process only if you are sure that the headers and
> footers in the book 
> you are validating are inconsistent and if you are
> able to normalize all 
> of them throughout the book.
> 
> --end instructions--
> 
> jesse.
> 
> ________________________
> 
> Jesse Fahnestock
> Collection Development Coordinator, Bookshare.org
> www.bookshare.org
> 
> A Project of The Benetech Initiative - Technology
> Serving Humanity
> 480 S. California Ave., Suite 201
> Palo Alto, CA 94306-1609  USA
> (650)475-5440 x133
> (650) 475-1066 FAX
> jesse@xxxxxxxxxxxx
> www.benetech.org  
> 
> -----Original Message-----
> From: bksvol-discuss-bounce@xxxxxxxxxxxxx
> [mailto:bksvol-discuss-bounce@xxxxxxxxxxxxx]On
> Behalf Of Jake
> Sent: den 26 augusti 2004 15:12
> To: bksvol-discuss@xxxxxxxxxxxxx
> Subject: [bksvol-discuss] Re: Automatic Stripper
> problem
> 
> 
> My guess is that is part of the issue. Many of the
> books I scan/plan to scan
> have page numbers at the top of the page except on
> pages where a new chapter
> begins, in that case the page numbers are located at
> the bottom of the page.
> So my guess is since the word Chapter is found first
> and on several pages
> that the program thinks it is a heading and
> therefore throws it out the
> window.
> I'm sure if the program bookshare is using was
> written by them to add code
> to skip the word chapter as a heading, but if not
> then I'd seriously
> recommend finding a new program that does what we
> want, not what we don't.
> 
> Jake
> ----- Original Message ----- 
> From: "Kyrath. (AKA Rob)" <kyrath@xxxxxxx>
> To: <bksvol-discuss@xxxxxxxxxxxxx>
> Sent: Thursday, August 26, 2004 7:48 AM
> Subject: [bksvol-discuss] Re: Automatic Stripper
> problem
> 
> 
> > Given the aggressive nature of the stripper, what
> I now intend to do is
> put
> > in the actual page number 2 lines above the
> chapter heading, assuming that
> > page numbers are on top.  In theory, this should
> prevent the stripper from
> > getting her greedy little hands on the chapter
> headings.  *grin*
> > However, I wonder how the stripper treats headings
> in books that have page
> > numbers at the bottom of the page?
> > -- Rob
> >
> > ----- Original Message ----- 
> > From: "Jake" <jabrown@xxxxxxxxx>
> > To: <bksvol-discuss@xxxxxxxxxxxxx>
> > Sent: Wednesday, August 25, 2004 11:02 PM
> > Subject: [bksvol-discuss] Re: Automatic Stripper
> problem
> >
> >
> > > Yes, I recently discovered that the auto
> stripper pretty much destroyed
> my
> > > first accepted submission.
> > > I understand the reason for getting rid of the
> headers, but when it gets
> > rid
> > > of critical information like Chapter zzz or
> something, sometimes it is
> > hard
> > > to realize that  you are in fact in a new
> chapter (I have also noticed
> > this
> > > with books I've downloaded).
> > >
> > > While going back and fixing the messed up titles
> would be a long and
> > > tiresome, not to mention cumbersome process, I
> believe that we need to
> get
> > > this problem resolved so that all new
> submissions are of a better
> quality.
> > >
> > > So, would it be a good idea for me to strip the
> headers in books before
> I
> > > submit them now?
> > >
> > > Thanks,
> > > Jake Brownell
> > > ----- Original Message ----- 
> > > From: <socly@xxxxxxxxx>
> > > To: <bksvol-discuss@xxxxxxxxxxxxx>
> > > Sent: Wednesday, August 25, 2004 9:23 PM
> > > Subject: [bksvol-discuss] Re: Automatic Stripper
> problem
> > >
> > >
> > > > I, too, strip my headers before submitting or
> uploading -- but what
> you
> > > say about the chapter headings worries me. Is
> this a new problem? I've
> > been
> > > putting the
> > > > page number on the first line, then skipping a
> couple of lines before
> > the
> > > Chapter heading, be it Chapter and a number or
> an actual title.  I hope
> > they
> > > haven't been
> > > > stripped.  And everyone wants page numbers (if
> you can't read the book
> > at
> > > one sitting, even when you're reading to
> children, how do you know where
> > you
> > > left off?
> > > > Of what if they want you to go back to a
> particular page?  I do hope
> > what
> > > you found, Dilsia, was an aberration. Maybe
> Jesse can clear it up for us
> > > (and publish
> > > > another list of books being worked on or
> awaiting approval.)
> > > >
> > > > Cindy
> > > >
> > > >
> > > >
> > > >
> > > > ----- Original Message -----
> > > > From: Pam Quinn <quinns@xxxxxxxxxxxxx>
> > > > Date: Wed, 25 Aug 2004 21:11:49 -0500
> > > > To: bksvol-discuss@xxxxxxxxxxxxx
> > > > Subject: [bksvol-discuss] Re: Automatic
> Stripper problem
> > > >
> > > > > I agree. I manually strip my own headers now
> before submitting, and
> > > > > even if everybody didn't do this, I'd rather
> see the headers left in
> > > > > than to lose information that the automatic
> stripper takes out. They
> > > > > just don't work the way that they should. Oh
> boy; here we go,
> talking
> > > > > about strippers again.
> > > > >
> > > > > Pam
> > > > >
> > > > >
> > > > > On Wed, 25 Aug 2004 19:55:51 -0400, you
> wrote:
> > > > >
> > > > > >Hi List:
> > > > > >
> > > > > >One of my books was accepted today. I
> downloaded the book to find
> out
> > > if the chapter headings were stripped. I had
> skipped a couple of blank
> > lines
> > > before
> > > > each chapter number. Sure enough, all chapter
> headings are gone as
> well
> > as
> > > other important headings. Apparently the trick
> of skipping a couple
> lines
> > > before each
> > > > chapter heading is not working any more, if it
> ever did. Does the
> > > automatic stripper always have to be applied?
> Personally I always strip
> > > headers of books that I
> > > > submit or validate. Another book that I
> validated all the numbers were
> > > stripped. The page numbers are important for
> this particular book
> because
> > > it's a choose
> > > > your own adventure which tells you to turn to
> certain pages at
> different
> > > points in the story. I find it very annoying
> that even the chapter
> > headings
> > > are stripped. I can
> > > > understand the titles being stripped.  In my
> humble opinion, I rather
> > have
> > > the page numbers be left in. It gives me an idea
> how far I am into the
> > book.
> > > But at least
> > > > the chapter heading
> > > > >  s should
> > > > > >definitely be preserved. Any suggestions on
> how to preserve the
> > chapter
> > > headings?
> > > > > >
> > > > > >*****
> > > > > >Grace
> > > > > >
> > > > > >MSN: gcpires@xxxxxxxxxxx
> > > > >
> > > > >
> > > > -- 
> > > >
> _______________________________________________
> > > > Find what you are looking for with the Lycos
> Yellow Pages
> > > >
> > >
> >
>
http://r.lycos.com/r/yp_emailfooter/http://yellowpages.lycos.com/default.asp?SRC=lycos10
> > > >
> > > >
> > >
> > >
> > >
> >
> >
> >
> 
> 
> 
> 



                
__________________________________
Do you Yahoo!?
New and Improved Yahoo! Mail - Send 10MB messages!
http://promotions.yahoo.com/new_mail 

Other related posts: