I actually have the original scan as a PDF. If someone is interested, I
can send it to them or convert it to KES, ARK or whatever where it can be
reprocessed by hand. I'd be happy to get it posted to the site in a less
rushed, more navigationally correct version.
________________________ Peter M. Scialli, Ph.D. Associate, Technical Projects, Bookshare.org www.bookshare.org
A Project of The Benetech Initiative - Technology Serving Humanity peter @benetech.org www.benetech.org
I think we here have as much a philosophical question as a technical one. As no matter what system is implemented or put in place, both on BookShare and within our ocr software, someone will find it not to their liking. On the one hand, having page numbers, sections, chapters, etc kept in the text is invaluable. But then when too much of that info is announced, others object. My personal preference is to have more rather than less kept; and hence, a lenient stripper; but I alredy understand the objections especially among those who do automated continuous reading, convert to mp3 and all the rest.
"The Broker" should be a case study in showing just how difficult all this can be especially when dealing with automated tools and rush scanning without hand validating. Unintentionally, and this could in no way have been prevented other than through painstaking effort which would have delayed availability of the book, valuable info was lost. In the short-run, having the book immediately available is more important than having technical glitches dealt with. Perhaps the best solution, in a case such as this, is to have the book immediately made available with the originally scanned copy placed on the step 1 validation page for someone, if they chose, to do the manual finetuning. Then, once validated, the improved copy would replace the original one. That would be the best of both worlds -- quick access but also addressing the real concerns expressed by Ken that the book isn''t optimally labeled internally.