atw: Word's find & replace dialog wildcards <long wide load>

In trying to explain how Word's Find and Replace (FnR) wilcard mechanism
works, I'll also present a practical solution to the multitude of problems
encountered by the seemingly innocuous ^p^p to ^p, whose usual objective is
to remove unnecessary blank lines. In doing so, we shall traverse the width
of Word's pitfalls that never fail to trip up a traveller.

First up, the Word Help System has some excellent help on wildcards. It is a
complete PITA to access, but you can find something. In Word 2k:

F1 - help > Answer Wizard | Index > Search on: wildcard
The second topic down is the master list of all FnR stuff. Select it.
Pick the Wildcard Characters topic down that list.
Now select the _type a wildcard_ hyperlink.
Hooray. Print the damn thing. Use it as a guide from now on :-) You have
just found the first excellent Quick Reference in the help system.


The very last two paragraphs are the key to what I am attempting here.

For our replace a double para with a single para, we would think that Find
^p^p and replace with ^p would do the job right?

Well, not really. If you do it via VBA you find yourself stalling forever if
your document is terminated by a blank paragraph as you have to perform it
iteratively until you get a Not .Found condition. Why does it fail to
replace the last paragraph mark? Well, you can't delete the last paragraph
mark - ever. When you a start a brand new virgin document and turn on View
Formatting, that paragraph mark you see is the End Of Document paragraph
mark. As the document exists and has a finite end point, that magic pilcrow
(backwards P) has to appear. It is also the marker point in memory to place
the nasty little objects we infest our nice clean ascii text with. Style
definitions, table formatting, list templates, graphical objects and the
list goes on. See Alt + F11 > F2 > Enter for more information.

So, to get around the VBA problem, we simple pre-process the final
paragraph. If it is blank, just a para mark, then kill the second last
character - which must be the penultimate paragraph mark. Manually, press
Ctrl+End and use the backspace key as often as required.

The main problem with the simple FnR replace postulation is similar. If you
just delete a para mark, you lose the style for that paragraph. So, we can
get around this by ensuring it is always the trailing paragraph that gets
deleted. It won't do the final blank paragraph in a document, but this is
solved above.

First, we need to understand how the brackets work, and the help topic does
that nicely. So let us put the guide into good use. (^p)^p means that we
have marked the first para mark as our first 'text chunk'. If we use \1 in
the replace string, it means to leave the first text chunk, the para mark
with the holy styling applied, in place. Unfortunately for us, we still
haven't got there yet.

We get an error, we can't use ^p if we are using wildcards. Bastards. So we
have to use ^013 instead. Herein lies our next problem - paragraph marks
that aren't! Oh yes kiddies, just because you see a pilcrow does not mean
you are looking at a paragraph mark. Oh no. Not with Paste Special and even
weirder applications handing in clipboard data streams without thought. Word
dutifully displays a pilcrow when it encounters an ASCII 013, but the
background machinery may not have resolved into a paragraph object to be
kept dynamically updated. 

How do I know it is ASCII 013? Well, I cheat. I select the paragraph mark,
or whatever character I need to know, and use VBA. Alt + F11 (VB Editor or
the VBE). Ctrl+G (Immediate Window). Enter: ? ASCW(Selection)

I use ASCW() rather than ASC() because I want the full Unicode value. For
ASCII characters the Unicode value is the same. Go ahead, work out the
wildcards' ASCII numbers and write it on yer guide.

So, if we are going to use replace (^013)^013 with ^013 we have to make sure
every ASCII 13 is a damn paragraph mark. Without wildcards on, find ^013 and
replace it with ^p. Honest paragraphs will see no change, fake paragraphs
get converted to your will on the spot.

Now you can get serious and stick yer wildcard search on. Replace (^013)^013
with \1 and we're in the clear. Done.

In a similar fashion, the much simpler exercise of replacing a colon that
occurs after a ket - a ) char - without destroying the ket itself, would be
to use wildcards, and replace (^041)^058 with \1.

However, if we were searching for a bra, a ( character, we run into another
peculiar little Word problem with managing RTF strings. If you insert a
symbol from the Windings range, or many other non-unicode graphical fonts,
Word actually stores a marker there instead, and then stores the actual font
character off beyond the end of section mark. That marker is ASCII 40, our
unfortunate bra. So an ^040^058 sequence could very well be any damn symbol
followed by a colon.

If we were using two blank paragraphs before every heading and no space
before to ensure our new pages always start at the very top no matter the
method used to page break, and we wanted to get rid of scads of three or
more blank paras in excess of a single hit (are we listening VBA people?) we
could do something evil and wicked like this: find (^013{2,2})(^013)@ and
replace it with \1. This leaves us with a maximum of two following blank
paragraphs anywhere in the document, even at the end - in one single find
operation. 

Interestingly enough, for those still able to follow, (^013{2,2})^013{1,}
fails with an invalid pattern. I forced it with the brackets for the above
solution.

Which then brings us to the final solution for technical writers seeking to
mass destroy all blank lines. It has taken a while, but boy haven't we
learn't a lot of useless stuff about Word on the way. Find (^013)(^013)@ and
replace with \1 to kill all blank paras in the document in a single pass,
with the exception of the first paragraph (there is no start of document
paragraph mark to give us a two-in-a-row target) and the last paragraph mark
(which is forbidden from the find range).


Steve Hudson

Word Heretic, Sydney, Australia 
Tricky stuff with Word or words for you.
www.wordheretic.com
ABN: 86 453 419 554   
"Qualified Good Tech Writer Dude"
Free Association of Words
Without prejudice

**************************************************
To post a message to austechwriter, send the message to 
austechwriter@xxxxxxxxxxxxxx

To subscribe to austechwriter, send a message to 
austechwriter-request@xxxxxxxxxxxxx with "subscribe" in the Subject field.

To unsubscribe, send a message to austechwriter-request@xxxxxxxxxxxxx with 
"unsubscribe" in the Subject field.

To search the austechwriter archives, go to 
www.freelists.org/archives/austechwriter

To contact the list administrator, send a message to 
austechwriter-admins@xxxxxxxxxxxxx
**************************************************

Other related posts: