[bksvol-discuss] Re: Any easy way in Word to convert book submitted as two-column rtf ?

  • From: Melissa Smith <mdsmith25@xxxxxxxx>
  • To: bksvol-discuss@xxxxxxxxxxxxx
  • Date: Sun, 14 Mar 2010 05:30:32 -0500

Actually, with the book I had, Word didn't even find the section breaks. My screen reader reported the section breaks. If I recall, Carrie had returned it to the check out list for further editing, but the original proofer never came back to work on it. Judy is the one with the current problem, and her's is a book that was scanned 2 pages at a time, but apparently the OCR was set to just recognize 1 page. So, she has columns, and section breaks with no page breaks.



Melissa Smith


On 3/14/2010 12:59 AM, Mike wrote:
Melissa,

I don't know if either of these is what you are encountering (and since you could reject the version with the problems it doesn't matter anyway, but I thought I'd throw in my oar on what I've seen anyway), but I have seen two cases where word has problems with search and replace on section breaks. If there is a section break inside a table, word will claim to find and replace it, but nothing actually seems to happens. The cases where I've seen this are when the OCR decides to make the table of contents into a word table, or when the OCR decides to make the page header into a table. The second case is a little more complicated to explain. First, think of section breaks as being like chapter indicators (that's how word expects them to be used). There are two kinds of section breaks to do this. Most chapters begin with a new page, so there are section breaks that indicate beginning both a new section and a new page. These are what OCR programs put in most of the time. But, there are books with chapters that begin in the middle of a page. So, word also has section breaks that indicate a new section, but not a new page. When you do search and replace of the section break that is also a page break, to convert it to just a page break, all works properly. When you try to convert a non-page break section break to a page break, word seems to get confused. If you search and replace these section breaks with page breaks, word may delete the section break without putting in a page break, but it will make the next section break a non page break section break, then when you delete that, it is removed without a page break being put in, until all the section breaks are removed from your book, but no page breaks are put in to replace them. Sometimes, word just will not these non page break section breaks (that is, even though it will find them, it will not replace them). When Carrie Karnos created a bunch of books that had both kind of page breaks in them because someone changed the settings in the OCR program at bookshare, I found out that doing the search and replace from end to beginning instead of beginning to end worked correctly (made the page break section breaks into page breaks and removed the non page break section breaks. I don't actually know why this works or even why I tried it in the first place.
Misha

Melissa Smith wrote:
I don't know, but I have seen those section breaks before, that Word doesn't find with the ^b. I rejected it, because there was another copy of the same book, on the check out, page, that was a better copy. I would like to know what those section
breaks are though.

Melissa Smith


On 3/13/2010 2:57 PM, Judy s. wrote:
I'm proofreading a young adult novel that's really had me frustrated.

Every page is really two pages. They obviously scanned it two pages at a time, and when it was OCRed they didn't convert it correctly. It ended up as every "page" in the rtf really being two pages, coded in word as two side-by-side columns.

The book has zero page breaks. They are all section breaks, which are usually easy to convert. In this case, when I convert them, it runs the two columns (that are really two separate pages, side by side) together. On top of that, it gives me a book that is one long column and only one letter wide! Then, it still has a kind of section break that's occurring on pages that have footnotes that I've never seen before. The ^b command does not find those, and I can't get Word to copy them so I can't figure out an ascii code for them that way. I can't delete them easily, either. I've had to go through the book by visually looking for them, putting a blank line before and after them, highlighting that little section, and then deleting it. I did a google search, and haven't come up with a code for it either.

Has anyone found a way using Word to easily convert a book like this into text that correctly has the pages one after another instead of side by side? Highlighting the entire book and removing the columns didn't work. I tried that several different ways.

I figured out a messy brute-force way to do it finally, by grabbing all the text and dumping it into a new rtf file as a special paste with no formatting. That gives me the text pretty much correctly (not completely - sometimes the columns are still intermingled), but I have to put in all the page breaks individually now. That isn't too bad, because it was missing half of the page breaks anyways. However, I can only find the missing ones by comparing the original rtf visually with my new rtf since half of the page numbers are missing. Yuck.

Any thoughts on other ways to do this are welcome! The scan, by the way, is beautiful to look at if you are sighted. It is an exact match to what the book must have looked like in printed form. But it's totally wrong for what we need! It's been checked out and released by several volunteers before me, and I sure know why! smile.

Judy s.

To unsubscribe from this list send a blank Email to
bksvol-discuss-request@xxxxxxxxxxxxx
put the word 'unsubscribe' by itself in the subject line. To get a list of available commands, put the word 'help' by itself in the subject line.


To unsubscribe from this list send a blank Email to
bksvol-discuss-request@xxxxxxxxxxxxx
put the word 'unsubscribe' by itself in the subject line. To get a list of available commands, put the word 'help' by itself in the subject line.



To unsubscribe from this list send a blank Email to
bksvol-discuss-request@xxxxxxxxxxxxx
put the word 'unsubscribe' by itself in the subject line. To get a list of available commands, put the word 'help' by itself in the subject line.


To unsubscribe from this list send a blank Email to
bksvol-discuss-request@xxxxxxxxxxxxx
put the word 'unsubscribe' by itself in the subject line.  To get a list of 
available commands, put the word 'help' by itself in the subject line.

Other related posts: