[bksvol-discuss] Re: question for member bookshare readers re tables of contents

  • From: Lynn Zelvin <lynn@xxxxxxxxxxxxxx>
  • To: bksvol-discuss@xxxxxxxxxxxxx
  • Date: Fri, 15 Jan 2010 14:39:57 -0500

OK. I am both relieved and confused by this answer. I guess it means I'll stop removing blank lines. . I was trying to make sure I only removed those which were just space between paragraphs and not those which might be intentional breaks, but if they're all going away anyway, I can stop thinking about it. Since the guidelines now really now say to read the whole book, it's no big deal to get rid of blank lines, it's just the extra attention to this. Line breaks within paragraphs, at least those we can be sure about, even doing search-replace 27 times, takes longer.

What confuses me more is what's a change we absolutely have to make for the books to convert properly. Like, this question of ellipses in tables of contents. It sounded like bookshare volunteers were trying to set a standard, rather than whether the ellipses were something the that was needed for the book to be processed properly. It sounded like volunteers had decided on page numbering at the top with blank lines before and after them rather than it being something needed - will the automation work just as well if page numbers are at the bottom of the page with or without blank lines before them? In the manual the checklist talks about standardizing the font and using 16 point for chapter headings. So, I did this, using 14 point for minor headings within chapters. But it also seems to me that doing this removes font changes the author may have intended. There wasn't anything about this. Is there actually a font size that is used by the automation process? Or is that another volunteer agreement? Is bold text in daisy books just where bolding was used for emphasis in the original? is it added or removed in headers? Some of the minor headings in this book I'm working with now were in bold and a few were not and the difference didn't seem to be in meaning, so I bolded them all, figuring that occasionally the scanner just didn't pick up the difference. But really, it sounds like nothing there matters because it will just be removed or added. I guess, that if we're not putting in codes, I'd want to know which formatting indicators the automation process actually uses, whether it's font size or bolding or just shorter lines with blank lines before and after, or what, and if nothing we do for a specific type of item matters, I'd like to know that also and I'll not bother. I have a similar question about footnotes, but will put that in a different message since it's sort of a different question.

I wanted to think I could follow instructions without asking a million boring questions on this list, but now I feel really confused. I used to do formatting for braille, both literary and textbook formatting, and it felt much clearer and easier because the rules were detailed and specific. I think the work is faster and easier when the rules are clear. And I'd personally vote for the option to mark sidebars and captions and any other feature that will actually be used, like, any indications that will make final processing of tables work better, rather than having to just leave them unclear. Certainly, if some books had features that others lacked, an end user would still benefit from having it when it's there.


At 03:45 PM 1/14/2010, you wrote:
Hi Lynn. Bookshare does automate a lot of things like page numbering, chapter indication, and removal of extra spaces and blank lines. Some volunteers are choosing to do things like delete blank lines by choice, not because they have to. It does make proofreading more comfortable for our sighted and Braille reading volunteers. Since I don't know how my proofreader will be working when I submit a book, I usually take a minute to do this step. However, I only do it because I can do so in less than 30 seconds. If it took much longer than that, I wouldn't bother since the Bookshare tool can do it during conversion. I definitely wouldn't do it if I had to do it manually, deleting line by line.

Bookshare's processer tool will number pages without page numbers. However, if a submitter submits a book with no page numbers, it can make proofreading and identifying missing pages more difficult. With no page numbers, you don't know for sure which pages to ask someone to scan for you if they're missing from the book. You have to guess and try to piece the text together. That's why Bookshare asks us to submit books with page numbers if they scan well enough. So that's a clarity issue rather than an automation problem.

As for the formatting, Bookshare daisy files do have font size changes, bolded text, as well as page and often chapter navigation. The catch is that not all daisy players work in the same way. Some older players like the Maestro don't handle chapter navigation at all. On the flip side, the now free Freedom Scientific daisy reader for JAWS and Pac Mate uses chapter navigation very well. Since there is such a wide range of functionality among daisy players, Bookshare has chosen to write their code to validate against the standards from the Daisy Consortium instead of writing for specific devices. So the old cliché "results vary" applies here. (smile)

Interestingly, the daisy format can support a caption element. However, when Bookshare staff asked if we'd be willing to use it to mark captions or sidebars, about half of the volunteers said that it would be too burdensome. It became a bit of a controversial issue. The idea was dropped at that point. I was disappointed because I thought it would go a long way toward making things like sidebars and text boxes more distinguishable when they interrupt the flow of text in a book. I'd like to see the staff revisit the issue, making it possible for those of us who want to label captions to do so. If they made it optional, I think people would gradually begin trying it, especially after seeing the improvements in books they read with better navigation. Those that felt uncomfortable with the process could skip it with no pressure or anything.

Monica Willyard
"The best way to predict the future is to create it." -- Peter Drucker

From: bksvol-discuss-bounce@xxxxxxxxxxxxx [mailto:bksvol-discuss-bounce@xxxxxxxxxxxxx] On Behalf Of Lynn Zelvin
Sent: Thursday, January 14, 2010 1:07 PM
To: bksvol-discuss@xxxxxxxxxxxxx
Subject: [bksvol-discuss] Re: question for member bookshare readers re tables of contents

There was a point where just being able to read a book at all was considered wonderful enough and better scanning quality was the best we could hope for. I'm appreciating all the thought it seems is now being put into formatting and other improvements. That being said, we're really working with very inadequate tools in trying to do better with this. We're asking the question as to the best compromise solution considering that there are probably a half dozen different formats in which people are reading these books - braille with several different page widths, text-to-speech with some people just sitting back and listening and others actively moving through the text as they read, some with a screen reader on their computer and some with a DAISY player on a computer or stand-alone device. Some use enlarged print where some enlarge the text meaning again we have different page widths, and others leave the original text and use software to magnify their screen image. Some are using combinations both looking at the words and listening. I've occasionally had things I did with both speech and braille, although I don't know if anyone actually reads that way. There are probably more that I'm not thinking about.

It's impossible to find one best way for all of these. The answer is in being able to use formatting and style codes, or at least in being able to standardize. and then for the final formats to make use of those codes. So if you code something as a page number, when converting to braille, it can be, for example, placed in the top right corner regardless of how wide the page is, could be spaced differently for different presentations of enlargement, and could contain a code that lets the daisy player actually know it's a page number. a line of dots in tables of contents could be present in visual and braille presentations, adjusted for page width, and be active links to the actual page.

I'd thought bookshare was doing some of that, but as I do tend to ignore formatting when I read, I can't say I've noticed. I don't use daisy players or anything else fancy as I don't like the speech engines they use. I'm sure they must certainly be doing this with the NIMAC books that we aren't allowed to access. I was going to go poking around in some of the books I already have, but it would be easier for someone who already knows to give an answer. Even though they don't ask volunteers to add in codes, I'd assumed they did some things by automation, like coding as page numbers sequential numbers that appear at the beginning and end of pages. If they're not, considering all the work volunteers are now putting into these books, it seems we should ask them for a few codes we can use. Validators who chose could then properly code tables of contents, chapter headings, page numbers, and footnotes, at the least. The volunteer manual I saw did recommend enough standardization of such things that it does seem bookshare could be making use of such efforts in the conversion process. Maybe they're afraid not enough people would validate books if they were expected to do this, but since some *are* doing it, maybe we could get some guidance from them. Maybe if they are not making full use of our efforts, we could prod them?

Am I correct in my new reckoning that there is a gap between volunteers and paid staff, that people making decisions about what to automate and how to convert books are not interacting with people doing scanning and validating? Is it that the hopes, which I share, are pinned on getting text from publishers in the future and thus not needing to go through all this? Well, even then we'll still need these tools to include older books in the collection. In the past validators were asked to do simpler things like make sure all the pages seem to be there. It's a lot more now and I think that's good, but what a shame for us here making such compromise decisions when we could do something that will really be used properly. Has this been discussed already?


Other related posts: