[bksvol-discuss] Fixing Occasional Hard Returns In The Middle Of Paragraphs

  • From: "Gerald Hovas" <GeraldHovas@xxxxxxxxxxx>
  • To: <bksvol-discuss@xxxxxxxxxxxxx>
  • Date: Thu, 13 Apr 2006 11:51:26 -0500

Here are the tips I sent to Grace for her to review.  Since she went ahead
and posted them, and since they're related to a couple of posts in the last
day, I thought I'd send them to the list.

 

Gerald

 

 

 

Removal Of Hard Returns At The End Of Every Line

 

 

Some scans contain Hard Returns (ASCII 13) at the end of every line.  While
this causes the text in the scan to appear exactly as it appears in the
book,

it results in poorer quality books because this character specifies a new
paragraph as well as a new line.  Each line in the file becomes its own
paragraph

which prevents members from being able to skim the book by paragraph since
skipping ahead to the beginning of the next paragraph takes you to the
beginning

of the next line, even if it isn't the beginning of the next paragraph in
the text.  The fix isn't simple and takes a little time, but it's quicker
than

rescanning the book and makes a difference to members who like to skim
through their favorite books from time to time.  Enough of a difference that
someone

may decide to rescan the book later to fix the problem.

 

The following is a procedure to fix the problem using Word.  Keep in mind
that each scan can have its own set of problems and that the steps may need
to

be adapted before the procedure can be used on some scans.

 

Note that the procedure expects a blank line after headers and before
footers.

 

First you need to verify if this procedure is needed.  This can be done
using the Ctrl-Up and Ctrl-Down keys.  The cursor will stop at the beginning
of

paragraphs.  If the cursor stops at the beginning of each line, then the
scan has this problem.

 

An alternative method for verifying the problem is to toggle invisible
characters on with Ctrl-* (Ctrl-Shift-8) and look for Paragraph Markers.  If
Paragraph

Marker appears at the end of each line, then the scan has this problem.
Pressing Ctrl-* again will toggle invisible characters off.

 

If the problem is intermittent instead of at the end of every line, then see
the tip for fixing occasional Hard Returns in the middle of paragraphs.

 

If the problem is indeed at the end of every line and the scan has blank
lines between paragraphs, then the following procedure should work.  Notes
on adapting

the procedure to scans which do not contain blank lines between paragraphs
and notes for adapting the procedure to K-1000 will be included afterwards.

 

List of 9 items

1. First make a backup copy of the file before starting.  That way it's easy
to fall back to a known position and start over if something goes wrong.

 

2. Move to the first page of the prologue or the first page of chapter one
if you do not wish to make changes to the book's frontmatter and select the
remaining

text with Ctrl-Shift-End.  Be careful not to move around in the file between
steps since this will cause the text to no longer be selected.

 

3. Replace ^w^p with ^p.  This will remove any whitespace at the end of
lines making them consistent.  Making the end of lines consistent is
necessary for

the following global find and replaces to fix every line.

 

4. You may also wish to remove whitespace at the beginning of lines if the
scan contains blank lines between paragraphs, but this isn't necessary for
the

remaining steps to work and will cause problems when adapting the procedure
to scans which do not contain blank lines between paragraphs.  If you wish

to do so, though, replace ^p^w with ^p.

 

5. Replace ^p^p^p with ^p^p.  This will remove multiple blank lines from the
document and will simplify the procedure.  Note that this find and replace

will need to be performed until no replacements are made by Wordin order to
remove all of the multiple blank lines in the book.

 

6. Replace ^p with ^l.  This will convert all Paragraph Markers to Manual
Line Breaks.  Manual Line Breaks are Soft Returns and only specify a new
line,

not a new line and paragraph.

 

7. Replace ^l^l with ^p^p.  This will change the two consecutive Manual Line
Breaks at the end of paragraphs back to Paragraph Markers.  This step is the

one which requires blank lines between paragraphs and is why blank lines
must follow headers and proceed footers.

 

8. Replace -^l with -.  This will prevent inserting a space after hyphens in
the next step.

 

9. Replace ^l with a space.  This will remove the Soft Return at the end of
every line without running two words together.  Now the problem should be
fixed.

 

list end

 

To adapt the procedure to scans without blank lines between paragraphs,
replace step 7 with the following step, and remember to leave out step 4.

 

7. Replace ^l^w with ^p^p or ^p followed by your preferred number of spaces
for indenting a paragraph.

 

To adapt the procedure to K-1000:

 

Use \n in place of ^p.

 

Use a Space in place of ^w..

 

Use a special symbol like ~ which doesn't appear anywhere in the book or a
string like [Newline] in place of ^l.

 

Note that you will need to perform the replacement of space\n until no
replacements are made in order for lines to be consistent.  It's possible
too, though

not probable, that you may also need to remove tabs at the end of lines as
well as spaces.  The ^w in Word removes strings containing any combination
of

spaces and tabs, so it isn't necessary to take this into consideration when
using Word.  Replacing a Tab with a Space prior to removing a space at the

end of lines would prevent having to deal with this issue in K-1000 and
simplify the alternate step 7.

 

Note that leaving out steps 8 and 9 would leave the text as it appears in
the book without preventing skimming by paragraph since ^l (ASCII 11)
doesn't

specify a new paragraph, only a new line.

 

 

 

Fixing Occasional Hard Returns In The Middle Of Paragraphs

 

 

OCR software will occasionally add a Hard Return or Paragraph Marker (ASCII
13) at the end of a line even though the line is not the last line of the
paragraph.  This causes the paragraph to be broken into two separate
paragraphs in the scan.

 

To search for this scanning error using Word, use the following search
strings:

 

^$^p  This wil find paragraphs which end in a letter.  Be aware that
replacing this string with nothing will not only remove the Paragraph
Marker, it will also remove the letter which the string finds, so you don't
want to use this in a find and replace.  Another reason yu don't want to use
this in a find and replace is that there are legitimate reasons for ending a
paragraph with a letter, and it's best to make sure what the string finds is
a scanning error.

 

,^p  This will find paragraphs which end in a comma.  Again, it's best to
not use this in a find and replace because there are also legitimate reasons
for ending a paragraph in a comma.

 

These strings are not guaranteed to find every occurrence of the problem,
but they should find nearly all of them.

 

Be aware that exiting the Find dialog box and using Page-Down and Page-Up to
find the next or previous occurrence will make it easy to fix a scanning
error when it's found since it eliminates the steps of opening and closing
the dialog box.

 

One legitimate reason for finding both strings in the book is that few pages
end in a complete sentence.  Another is that letters or notes often appear
in books, and these strings will find the opening or closing of the letter
or note.

 

Searching for this scanning error takes a little while, but it's one that
you will want to check if you're striving for the perfect scan.

 

Note that if you are finding Hard Returns at the end of every line or at the
end of most lines, then refer to the tip for removing Hard Returns at the
end of every line.

Other related posts: