[bksvol-discuss] Re: Missing chapter headings

  • From: Mike <mlsestak@xxxxxxxxxxxxx>
  • To: bksvol-discuss@xxxxxxxxxxxxx
  • Date: Wed, 08 Apr 2009 18:26:48 -0700

Hi Bob,

I can tell you from really sad experience about the stripper.

It is supposed to remove text that is repeated at the top (header) or bottom 
(footer) of every page or every other page.  In the print version of books, 
these headers which repeat the name of the book, or the authors name, or the 
chapter name at the top or bottom of each page, or alternate any two of these 
every other page, are common.  However, in many cases, these headers or footers 
are in a different font or size and the OCR program gets confused and messes 
them up, so they really aren't repeated text.  In this case it is best for the 
submitter or proofreader to remove them.

Now, what happened to me. I actually proofread a book where every single header was perfect (for those who remember only a few months ago, this is the nefarious Theodore Sturgeon short story collection incident). For each story, the title of the story was one of the headers (the other was the author's name, I think).
I thought, how wonderful, I don't have to go through that horrible process of 
removing headers.
But, the page numbers were at the bottom of the page.  So, the first line on 
the pages that began each story were exactly the same as the headers...

CHOMP

Next thing I know, someone is grumbling on this email list that there was a 
very good short story collection that had just been put onto bookshare, except 
there were no titles on the stories.

So, that is how the stripper removes chapter or story titles. If the headers are removed, you shouldn't have to worry about protecting chapter or story titles (though one of the bookshare tech guys like Jake or Pratik said that at least with the old stripper, there was some special case that had to do with the first word on the page being "Chapter" but I forget exactly what happened then). If the page numbers are all on the first line of the page with nothing else on that line, you shouldn't have to do anything else, either. And now, according to Jake, even if the headers are left in, but the chapter titles are in a larger font, you shouldn't need to do anything else. But as a sighted volunteer, I can tell you from experience that the OCR software can make both regular text and header text all kinds of different fonts and sizes for no apparent reason whatsoever. I try to make all the text at least approximately one size to make it easier for me to proofread (this might also help sighted bookshare members, but if the font or size or bold or italics change every couple lines I'm not going to do a very good job proofreading, I'll go crazy instead).
So, that's my rant of the day,

Misha


From: bksvol-discuss-bounce@xxxxxxxxxxxxx
[mailto:bksvol-discuss-bounce@xxxxxxxxxxxxx] On Behalf Of Bob
Sent: Friday, March 27, 2009 10:20 PM
To: bksvol-discuss@xxxxxxxxxxxxx
Subject: [bksvol-discuss] Re: Missing chapter headings

Okay, I'm not trying to drag this thread out, but I really am curious.

Does anyone have a good guess as to why the stripper loves chapter headings
and story titles rather than plain text at the top of the page? It doesn't
strip away the first lines of a page. It doesn't strip away chapter headings

and story titles if they occur anywhere else besides at the top of the page.

But, something about a chapter heading or story title at the top of the page

says to the stripper "ummm, I'm good for eating." Why does it say that? If
you plug in some text above the title (say, a page number or a row of stars)

then the stripper loses its apetite.

Bob (living out his ninth life)



To unsubscribe from this list send a blank Email to
bksvol-discuss-request@xxxxxxxxxxxxx
put the word 'unsubscribe' by itself in the subject line.  To get a list of 
available commands, put the word 'help' by itself in the subject line.

Other related posts: