Hi Jesse,I sent you the file. Another angle that has occurred to me that I want to present is that I am opening the downloaded files from bookshare with Kurzweil 1000. So I don't know whether the problem lies at which end. Vic. ----- Original Message ----- From: "Jesse Fahnestock" <Jesse.F@xxxxxxxxxxxx> To: <bookshare-discuss@xxxxxxxxxxxxx> Sent: Thursday, June 17, 2004 11:32 AM Subject: [bookshare-discuss] Re: What's going on? Vic: The mishandling of text you note below is definitely not typical. I = expect that the problem with the quotation marks was due to a = mis-handline of a special character (curly quotes or something like = that), and I will have our engineers confirm any problems there for me. = As far as pagination goes, as we've discussed here, it is a tricky thing = to handle automatically. Our testing shows that the automatic header = stripping and replacement of page numbers with DAISY page breaks works = on a majority of scanned pages. However, irregularity in the scans will = inevitably throw some pages off. Below I'm recopying the text about = "normalizing" headers and footers, which will help insure that the = automated tool handles pagination as well as it can. Let me again stress = however, that volunteers who want to normalize text around page breaks = must do so for every page, otherwise the logarithms involved will not = work at all. It sounds as if you spent a lot of time editing the book, = however, so I wanted to make sure you had access to these methods if you = want to pursue them. Please feel free to contact me offline if you have = questions at volunteer@xxxxxxxxxxxxx -- I would be happy to look at the = book you prepared for validation and let you know if I see anything = irregular.=20 Header/footer/page break normalization: Volunteers can assist this tool by "normalizing" headers, footers, and=20 page numbers in submitted files where they do not appear consistent.=20 Normalizing such a headers/footers helps but it needs to be a=20 complete job, as normalizing just a few headers could skew the=20 probability of properly recognizing them throughout the book. If you=20 wish to undertake this task, please be sure to: 1) Check line position of text (the first paragraph on a given page=20 should be the header, the last should be the footer) 2) Check that page numbers should have a space on either side,=20 separating them from the header/footer text. If the page number is=20 the first character in a header it does not need a space before it; or = if=20 it is the last character in a footer it does not need a space after it. 3) Only change text in the header or footer in order to make it look=20 like all other headers/footers 4) Perform 1-3 on every page. Remember that the automated tool is designed to be effective on most=20 scanned books so that you should undertake this "normalization"=20 process only if you are sure that the headers and footers in the book=20 you are validating are inconsistent and if you are able to normalize all = of them throughout the book. jesse. ________________________ Jesse Fahnestock Collection Development Coordinator, Bookshare.org www.bookshare.org A Project of The Benetech Initiative - Technology Serving Humanity 480 S. California Ave., Suite 201 Palo Alto, CA 94306-1609 USA (650)475-5440 x133 (650) 475-1066 FAX jesse@xxxxxxxxxxxx www.benetech.org=20 -----Original Message----- From: Vic Llanes [mailto:v.llanes@xxxxxxxxxxx] Sent: Thursday, June 17, 2004 4:35 AM To: bksvol-discuss@xxxxxxxxxxxxx; bookshare-discuss@xxxxxxxxxxxxx Subject: [bookshare-discuss] What's going on? With all these talk about the automated header and stripper of books, I thought I'd check and compare a copy of a book I've edited with so much painstaking care to a copy downloaded from the public download site. My heart just sunk when I saw what is happening. The page breaks are all = over the place, spaces between words are missing and other characters are inserted. the following 2 lines are an example of what is happening to words. Curtisea8Octavius Curtis," Octavius The first line was from the one I downloaded. The comma and a quotation = mark was replaced with the number eight and the space disappeared. The second line was from the copy of the book I still have. The book in question is the latest I've edited and validated which is Spider-man2. Vic