[bksvol-discuss] Re: Fixing Occasional Hard Returns In The Middle Of Paragraphs

  • From: "Paula and James Muysenberg" <outofsightlife@xxxxxxxxxxx>
  • To: <bksvol-discuss@xxxxxxxxxxxxx>
  • Date: Fri, 14 Apr 2006 08:43:19 -0500

Gerald,

    I know this must have taken a long time to put together. My thanks to you, 
Grace Jake, et al, for the work you do on behalf of the volunteer community.

Paula
  ----- Original Message ----- 
  From: Gerald Hovas 
  To: bksvol-discuss@xxxxxxxxxxxxx 
  Sent: Thursday, April 13, 2006 11:51 AM
  Subject: [bksvol-discuss] Fixing Occasional Hard Returns In The Middle Of 
Paragraphs


  Here are the tips I sent to Grace for her to review.  Since she went ahead 
and posted them, and since they're related to a couple of posts in the last 
day, I thought I'd send them to the list.

   

  Gerald

   

   

   

  Removal Of Hard Returns At The End Of Every Line

   

   

  Some scans contain Hard Returns (ASCII 13) at the end of every line.  While 
this causes the text in the scan to appear exactly as it appears in the book,

  it results in poorer quality books because this character specifies a new 
paragraph as well as a new line.  Each line in the file becomes its own 
paragraph

  which prevents members from being able to skim the book by paragraph since 
skipping ahead to the beginning of the next paragraph takes you to the beginning

  of the next line, even if it isn't the beginning of the next paragraph in the 
text.  The fix isn't simple and takes a little time, but it's quicker than

  rescanning the book and makes a difference to members who like to skim 
through their favorite books from time to time.  Enough of a difference that 
someone

  may decide to rescan the book later to fix the problem.

   

  The following is a procedure to fix the problem using Word.  Keep in mind 
that each scan can have its own set of problems and that the steps may need to

  be adapted before the procedure can be used on some scans.

   

  Note that the procedure expects a blank line after headers and before footers.

   

  First you need to verify if this procedure is needed.  This can be done using 
the Ctrl-Up and Ctrl-Down keys.  The cursor will stop at the beginning of

  paragraphs.  If the cursor stops at the beginning of each line, then the scan 
has this problem.

   

  An alternative method for verifying the problem is to toggle invisible 
characters on with Ctrl-* (Ctrl-Shift-8) and look for Paragraph Markers.  If 
Paragraph

  Marker appears at the end of each line, then the scan has this problem.  
Pressing Ctrl-* again will toggle invisible characters off.

   

  If the problem is intermittent instead of at the end of every line, then see 
the tip for fixing occasional Hard Returns in the middle of paragraphs.

   

  If the problem is indeed at the end of every line and the scan has blank 
lines between paragraphs, then the following procedure should work.  Notes on 
adapting

  the procedure to scans which do not contain blank lines between paragraphs 
and notes for adapting the procedure to K-1000 will be included afterwards.

   

  List of 9 items

  1. First make a backup copy of the file before starting.  That way it's easy 
to fall back to a known position and start over if something goes wrong.

   

  2. Move to the first page of the prologue or the first page of chapter one if 
you do not wish to make changes to the book's frontmatter and select the 
remaining

  text with Ctrl-Shift-End.  Be careful not to move around in the file between 
steps since this will cause the text to no longer be selected.

   

  3. Replace ^w^p with ^p.  This will remove any whitespace at the end of lines 
making them consistent.  Making the end of lines consistent is necessary for

  the following global find and replaces to fix every line.

   

  4. You may also wish to remove whitespace at the beginning of lines if the 
scan contains blank lines between paragraphs, but this isn't necessary for the

  remaining steps to work and will cause problems when adapting the procedure 
to scans which do not contain blank lines between paragraphs.  If you wish

  to do so, though, replace ^p^w with ^p.

   

  5. Replace ^p^p^p with ^p^p.  This will remove multiple blank lines from the 
document and will simplify the procedure.  Note that this find and replace

  will need to be performed until no replacements are made by Wordin order to 
remove all of the multiple blank lines in the book.

   

  6. Replace ^p with ^l.  This will convert all Paragraph Markers to Manual 
Line Breaks.  Manual Line Breaks are Soft Returns and only specify a new line,

  not a new line and paragraph.

   

  7. Replace ^l^l with ^p^p.  This will change the two consecutive Manual Line 
Breaks at the end of paragraphs back to Paragraph Markers.  This step is the

  one which requires blank lines between paragraphs and is why blank lines must 
follow headers and proceed footers.

   

  8. Replace -^l with -.  This will prevent inserting a space after hyphens in 
the next step.

   

  9. Replace ^l with a space.  This will remove the Soft Return at the end of 
every line without running two words together.  Now the problem should be fixed.

   

  list end

   

  To adapt the procedure to scans without blank lines between paragraphs, 
replace step 7 with the following step, and remember to leave out step 4.

   

  7. Replace ^l^w with ^p^p or ^p followed by your preferred number of spaces 
for indenting a paragraph.

   

  To adapt the procedure to K-1000:

   

  Use \n in place of ^p.

   

  Use a Space in place of ^w..

   

  Use a special symbol like ~ which doesn't appear anywhere in the book or a 
string like [Newline] in place of ^l.

   

  Note that you will need to perform the replacement of space\n until no 
replacements are made in order for lines to be consistent.  It's possible too, 
though

  not probable, that you may also need to remove tabs at the end of lines as 
well as spaces.  The ^w in Word removes strings containing any combination of

  spaces and tabs, so it isn't necessary to take this into consideration when 
using Word.  Replacing a Tab with a Space prior to removing a space at the

  end of lines would prevent having to deal with this issue in K-1000 and 
simplify the alternate step 7.

   

  Note that leaving out steps 8 and 9 would leave the text as it appears in the 
book without preventing skimming by paragraph since ^l (ASCII 11) doesn't

  specify a new paragraph, only a new line.

   

   

   

  Fixing Occasional Hard Returns In The Middle Of Paragraphs

   

   

  OCR software will occasionally add a Hard Return or Paragraph Marker (ASCII 
13) at the end of a line even though the line is not the last line of the 
paragraph.  This causes the paragraph to be broken into two separate paragraphs 
in the scan.

   

  To search for this scanning error using Word, use the following search 
strings:

   

  ^$^p  This wil find paragraphs which end in a letter.  Be aware that 
replacing this string with nothing will not only remove the Paragraph Marker, 
it will also remove the letter which the string finds, so you don't want to use 
this in a find and replace.  Another reason yu don't want to use this in a find 
and replace is that there are legitimate reasons for ending a paragraph with a 
letter, and it's best to make sure what the string finds is a scanning error.

   

  ,^p  This will find paragraphs which end in a comma.  Again, it's best to not 
use this in a find and replace because there are also legitimate reasons for 
ending a paragraph in a comma.

   

  These strings are not guaranteed to find every occurrence of the problem, but 
they should find nearly all of them.

   

  Be aware that exiting the Find dialog box and using Page-Down and Page-Up to 
find the next or previous occurrence will make it easy to fix a scanning error 
when it's found since it eliminates the steps of opening and closing the dialog 
box.

   

  One legitimate reason for finding both strings in the book is that few pages 
end in a complete sentence.  Another is that letters or notes often appear in 
books, and these strings will find the opening or closing of the letter or note.

   

  Searching for this scanning error takes a little while, but it's one that you 
will want to check if you're striving for the perfect scan.

   

  Note that if you are finding Hard Returns at the end of every line or at the 
end of most lines, then refer to the tip for removing Hard Returns at the end 
of every line.

Other related posts: