[bksvol-discuss] Re: line feeds, carriage returns and page breaks

  • From: "Mayrie ReNae" <mayrierenae@xxxxxxxxx>
  • To: <bksvol-discuss@xxxxxxxxxxxxx>
  • Date: Tue, 26 May 2009 16:14:55 -0700

Hi Rik,
 
    I'm not sure I understand exactly what you are asking.  For instance,
what exactly do you want to know about page breaks?
 
    However, some of what you say, I do understand and will address.  For
instance, unwanted paragraph marks, or as you call them, carriage returns.
For the sake of accuracy and a common lexicon, let's call them paragraph
marks, okay?
 
    It is very common to have OCR insert unwanted paragraph marks in books
when you scan.  It happens more in some books than others.  Sometimes this
happens, because folks scanning with Kurzweil do not have the feature in the
reading settings menu to ignore line endings enabled.  When this is the
case, the lines are preserved the length that they were in the book, and are
broken by paragraph marks every time the printed page ran out of space on
the margin, instead of just inserting paragraph marks where actual
paragraphs begin.  This will create a book with absolutely scads of unwanted
paragraph marks.  Sometimes they are just inserted because the OCR had to
guess where it thought a paragraph mark should go and was wrong. This does
happen with other OCR programs, as well as Kurzweil, just so that no one
thinks I'm picking on Kurzweil for being the only software to cause this
issue.  It isn't. 
 
    Was there more about paragraph marks that you wanted to know?  If so,
just ask.
 
    I did give incomplete (thanks Misha for pointing that out to me)
instructions for getting rid of unwanted paragraph marks using Microsoft
Word.  In Kurzweil the same logic applies, but the control characters are
different.  In Kurzweil1000 there is also a check box in the find and
replace dialogue to tell the software to only look for lower case letters.
The dialogue askes you to enable or disable case sensitivity.  You want this
enabled, definitely. As Misha pointed out, if you don't enable it, all
paragraph marks wil disappear which would be a nightmare!  But I digress a
bit.  
 
For each lower case letter of the alphabet you want to do a find and replace
for a paragraph mark followed by a lower case letter, and replace it with a
space followed by that same letter.  The reason that you use lower case
letters is because it is guaranteed that a paragraph is never going to be
supposed to start with a lower case letter, and getting rid of paragraph
marks before lower case letters and replacing them with a space will get rid
of the majority of paragraph marks that should not be present in any given
document.
 
    Now, I can hear you. "Just tell me how to get rid of them using K1000,
already."  Am I right?  So, here you go.  In the find box of the find and
replace dialogue accessed by pressing control plus h type \n followed by a
lower case letter. In the replace box type a space followed by that same
lower case letter.  Hit enter on "replace all", making sure to enable case
sensitivity on your way to the "replace all" button.  Repeat this process
with each lower case letter of the alphabet.  It sounds long and involved.
The process seemed intimidating to me for a while, but it really doesn't
take more than 5 minutes to do all of the letters.  
 
    By the way, Kurzweil1000's equivalent to a paragraph mark in microsoft
Word when you're typing it into the find box of the dialogue is the \n that
is, in case you have all of your punctuation turned off and read only with
speech, the backslash followed by the lower case letter n, and then the
lower case letter that you're trying to find.
 
    Also, on the subject of paragraph marks.  Sometimes the OCR doesn't know
or see that it should separate lines of dialogue into separate paragraphs.
So when two people are talking to each other in a book, their conversation
will sometimes be run into one long paragraph where there should be more
than one.  You can do a find and replace to fix this.  If you search for " "
(that is quotation mark space quotation mark) and replace it with "\n" (that
is quotation mark backslash n quotation mark), you'll disconnect improperly
joined lines of dialogue in one fell swoop.  
 
    Was there more that you wanted to know?  If so, just ask.
 
Mayrie
 
 

  _____  

From: bksvol-discuss-bounce@xxxxxxxxxxxxx
[mailto:bksvol-discuss-bounce@xxxxxxxxxxxxx] On Behalf Of Rik James
Sent: Tuesday, May 26, 2009 2:11 PM
To: bksvol-discuss@xxxxxxxxxxxxx
Subject: [bksvol-discuss] Re: line feeds, carriage returns and page breaks


So, please pardon my under-educatedness on this matter.
 
Under-educatedness?  (Now that is one for ranked spelling, I bet!)   Maybe
it was first heard by the hoss they called W.
 
But it has been of confusion to me also for some time.
That is, if I am to understand better, if someone can start from the
beginning. I aplogize for asking you to revisit where most probably are
already weary.
 
But why is this all happening, this line feed and carriage return and page
break business?
 
And is it the case in books submitted when using Kurzweill program to
produce a book?
And if so, how may it be avoided from the onset so as not to create work
after the fact?
 
My aim and attempt is to produce and submit a book that is clean and
suitable with minimal needs for work.
 
I have long noticed that when I copy text from somewhere often it will
appear with these very annyoing line breaks in the middle of sentences.
Is this at least part of which you all are speaking?  
 
If so, I have for a long wished to know a quick and more painless way to
remove them line by tedious line.
 
Thank you in advance and please again apologies if I'm a retread here.
 
Best,
Rik
RixMix2009@xxxxxxxxx
 
PS for some reason my own posts do NOT come back to me on this list. But it
appears that they get posted, so I'm not trying to fix that anymore! <g>

Other related posts: