[bksvol-discuss] Re: Automated Strippers and Page Numbers

  • From: Cindy <popularplace@xxxxxxxxx>
  • To: bksvol-discuss@xxxxxxxxxxxxx
  • Date: Sat, 28 Aug 2004 10:45:42 -0700 (PDT)

I thank you for your post, Donna. I can't download
books from the collection, so I can't tell, as per the
recent discussion, if the books I've submitted or
validated still have the page numbers and chapter
titles or not.

I eliminate the headers (a replace all [with nothing]
usually works,  though it has to be repeated a few
times as the messed-up header changes its form) but
leave the page numbers on the top of the page, skip a
line and then begin the text. When there's a chapter
title, I create one or two line spaces before the
chapter title and try to imitate the size and font
(bearing in mind that people have said some readers
won't take too large a type or certain fonts). I know
that if people can't see the book they can't do that,
but most chapter titles seem to be either caps, a
larger font, or italics, and perhaps making the
chapter title different from the text and placing it
it lower down on the page by a couple of line spaces
will preserve it.

Cindy

--- Donna Smith <donnafsmith@xxxxxxxxxxxxx> wrote:

> Hi gang.
> 
> I hear the frustration expressed on this list by
> dedicated volunteers
> concerned that hard work is being negated by
> automated tools, and I share
> your frustration.  While I am more than happy to
> volunteer quite a lot of my
> available personal time to scanning/validating books
> for BookShare, I want
> to know that I'm getting the best results for my
> time and effort.
> 
> So this morning I decided to do a little comparison
> of books I have either
> scanned or validated with the final product in the
> collection.  I learned
> that page numbers are preserved whether I leave
> headers in or take them out,
> and if I leave headers in, they are mostly stripped.
>  Consequently, I have
> determined that I will do the following:  If the
> headers in a book mostly
> scan well and not a lot is required to normalize
> them, I'll leave them in so
> the automated stripper will have something to strip.
>  If the headers in the
> book inconsistently scan so that it is a lot of
> trouble to normalize them,
> then I'll strip them out myself because that is
> actually less work in some
> cases than normalizing.  Either way, the page
> numbers seem to be preserved.
> 
> BTW, when browsing through the new books page, I
> came across "Escape" one of
> the choose your own adventure books, so I downloaded
> it for a check as well.
> Page numbers are there.
> 
> It is my understanding that the automated stripper
> takes out only
> consistently repeating phrases such as the author's
> name when it appears at
> the top of every other page, and the name of the
> book when it appears at the
> top of alternate pages.  Since the page numbers
> change with every page, even
> if they are on the same line as the header, they are
> left in.  It is also my
> understanding that the automated stripper doesn't
> strip out whatever happens
> to be at the top of each page.  So if each page
> starts with a new line of
> text, (no header), then it's not stripped unless
> every page, or a
> significant number of pages, start out with the same
> line of text.
> 
> I also have to add that most of the books I have
> personally downloaded and
> read over the last couple of years have had page
> numbers.  Headers are a
> little more inconsistent, but it looks to me like
> junk headers remain in
> those books where the headers are typically
> scrambled and not necessarily
> scrambled in a consistent manner.
> 
> The bottom line for me is that page numbering is
> retained and lines of text
> aren't stripped.  While I prefer that the headers
> not be there, and I will
> continue to submit/validate in a manner designed to
> help the automated
> stripper get rid of them, I've never chosen to not
> read a book because
> headers were still present.  On the other hand, I
> have chosen to not read a
> book because the text quality was too poor.
> 
> Hope this helps.  I didn't want to be discouraged
> about my favorite
> volunteer job and I didn't want others to be
> discouraged unnecessarily
> either.  I urge you to check your finished work from
> the collection even if
> you have to go back a few months to find something. 
> I'm afraid I'm guilty
> of reading from the RTF version I scan and submit
> rather than waiting for it
> to clear the process and reading it from the
> collection.  <smile>  Hence,
> the need to make this special effort this morning to
> actually look and see
> what my scans look like once they're in the
> collection.
> 
> Peace and Hope,
> 
> Donna
> 
> 
> 



        
                
__________________________________
Do you Yahoo!?
New and Improved Yahoo! Mail - 100MB free storage!
http://promotions.yahoo.com/new_mail 

Other related posts: