[bookshare-discuss] Re: What's going on?

  • From: "Vic Llanes" <v.llanes@xxxxxxxxxxx>
  • To: <bookshare-discuss@xxxxxxxxxxxxx>, <bksvol-discuss@xxxxxxxxxxxxx>
  • Date: Thu, 17 Jun 2004 16:35:16 -0400

Hi Jesse,I sent you the file. Another angle that has occurred to me that I
want to present is that I am opening the downloaded files from bookshare
with Kurzweil 1000. So I don't know whether the problem lies at which end.

Vic.
----- Original Message ----- 
From: "Jesse Fahnestock" <Jesse.F@xxxxxxxxxxxx>
To: <bookshare-discuss@xxxxxxxxxxxxx>
Sent: Thursday, June 17, 2004 11:32 AM
Subject: [bookshare-discuss] Re: What's going on?


Vic:

The mishandling of text you note below is definitely not typical. I =
expect that the problem with the quotation marks was due to a =
mis-handline of a special character (curly quotes or something like =
that), and I will have our engineers confirm any problems there for me. =
As far as pagination goes, as we've discussed here, it is a tricky thing =
to handle automatically. Our testing shows that the automatic header =
stripping and replacement of page numbers with DAISY page breaks works =
on a majority of scanned pages. However, irregularity in the scans will =
inevitably throw some pages off. Below I'm recopying the text about =
"normalizing" headers and footers, which will help insure that the =
automated tool handles pagination as well as it can. Let me again stress =
however, that volunteers who want to normalize text around page breaks =
must do so for every page, otherwise the logarithms involved will not =
work at all. It sounds as if you spent a lot of time editing the book, =
however, so I wanted to make sure you had access to these methods if you =
want to pursue them. Please feel free to contact me offline if you have =
questions at volunteer@xxxxxxxxxxxxx -- I would  be happy to look at the =
book you prepared for validation and let you know if I see anything =
irregular.=20

Header/footer/page break normalization:

Volunteers can assist this tool by "normalizing" headers, footers, and=20
page numbers in submitted files where they do not appear consistent.=20
Normalizing such a headers/footers helps but it needs to be a=20
complete job, as normalizing just a few headers could skew the=20
probability of properly recognizing them throughout the book. If you=20
wish to undertake this task, please be sure to:

1) Check line position of text (the first paragraph on a given page=20
should be the header, the last should be the footer)
2) Check that page numbers should have a space on either side,=20
separating them from the header/footer text. If the page number is=20
the first character in a header it does not need a space before it; or =
if=20
it is the last character in a footer it does not need a space after it.
3) Only change text in the header or footer in order to make it look=20
like all other headers/footers
4) Perform 1-3 on every page.

Remember that the automated tool is designed to be effective on most=20
scanned books so that you should undertake this "normalization"=20
process only if you are sure that the headers and footers in the book=20
you are validating are inconsistent and if you are able to normalize all =

of them throughout the book.


jesse.

________________________

Jesse Fahnestock
Collection Development Coordinator, Bookshare.org
www.bookshare.org

A Project of The Benetech Initiative - Technology Serving Humanity
480 S. California Ave., Suite 201
Palo Alto, CA 94306-1609  USA
(650)475-5440 x133
(650) 475-1066 FAX
jesse@xxxxxxxxxxxx
www.benetech.org=20

-----Original Message-----
From: Vic Llanes [mailto:v.llanes@xxxxxxxxxxx]
Sent: Thursday, June 17, 2004 4:35 AM
To: bksvol-discuss@xxxxxxxxxxxxx; bookshare-discuss@xxxxxxxxxxxxx
Subject: [bookshare-discuss] What's going on?


With all these talk about the automated header and stripper of books, I
thought I'd check and compare a copy of a book I've edited with so much
painstaking care to a copy downloaded from the public download site. My
heart just sunk when I saw what is happening. The page breaks are all =
over
the place, spaces between words are missing and other characters are
inserted. the following 2 lines are an example of what is happening to
words.

Curtisea8Octavius

Curtis," Octavius



The first line was from the one I downloaded. The comma and a quotation =
mark
was replaced with the number eight and the space disappeared. The second
line was from the copy of the book I still have.

The book in question is the latest I've edited and validated which is
Spider-man2.



Vic




Other related posts: