RE: Regular Expression Question: How To Search For Section Titles

  • From: "Homme, James" <james.homme@xxxxxxxxxxxx>
  • To: "programmingblind@xxxxxxxxxxxxx" <programmingblind@xxxxxxxxxxxxx>
  • Date: Mon, 9 Aug 2010 07:15:22 -0400

Hi,
I've narrowed down what I want to match a little better now. I'd love to not 
have a block for this stuff. Sorry people. Anyway, here goes. 

The first thing I want to match is two new lines.
Next is a line of text that starts with a capital letter.
The line can have anything else on it.
The line should end with a letter or a number.

Then I want to replace it with what I have found, but I want to put the EdSharp 
section break string before it and an extra new line after it.

A section break in EdSharp looks like this. \n----------\n\f\n

Thanks.

Jim



Jim Homme,
Usability Services,
Phone: 412-544-1810. Skype: jim.homme
Internal recipients,  Read my accessibility blog. Discuss accessibility here. 
Accessibility Wiki: Breaking news and accessibility advice


-----Original Message-----
From: programmingblind-bounce@xxxxxxxxxxxxx 
[mailto:programmingblind-bounce@xxxxxxxxxxxxx] On Behalf Of Dave
Sent: Saturday, August 07, 2010 3:17 PM
To: programmingblind@xxxxxxxxxxxxx
Subject: Re: Regular Expression Question: How To Search For Section Titles

Btw, your reg ex depending on which engine you're using, could just be
^[A-Z].$

(the period in your reg ex could be interpreted to match any character
besides a new line).  Thus, this reg ex just matches a line that
starts with an upper case character followed by any number of
characters.

If you instead wanted only alphanumeric characters, then you would
probably try the pattern:
^[A-Z][A-Za-z0-9]*$

(the asterik follows the group and is called a quantifier.  The
asterik simply says that the pattern should have 0 or more of this
set).  I'm not sure what the ".*" you had in your pattern would have
resulted in.

It would then match:
A1234

Ca1a

or

Z

Hth.

On 8/7/10, Kerneels Roos <kerneels@xxxxxxxxx> wrote:
> Going out on a limb here regarding exact command line options, but you could
> use the 'sed` command, the stream editor to do this at the command prompt:
>
>
>
>
> $ sed s/(^[A-Z].*[A-Za-z0-9]$)/\f\1/g in.txt > out.txt
>
> or:
>
> $ sed s/\(^[A-Z].*[A-Za-z0-9]$\)/\f\1/g in.txt > out.txt
>
> if `(' needs to be escaped. Not sure if '\f` would insert page breaks either
> -- might have to access the direct ASCII value, but anyways.
>
> The `s' in the sed regular expression pattern instructs sed that you want to
> do a substitution.
>
> On Fri, Aug 6, 2010 at 7:48 PM, Homme, James
> <james.homme@xxxxxxxxxxxx>wrote:
>
>> Hi,
>> Maybe EdSharp uses .Net regular expressions, and maybe they are different
>> from Perl regular expressions. I was trying to use $1 to capture and
>> replace, but it was literally inserting $1. I was trying to put
>> \f before $1 in the replacement expression. I'm attempting to find what it
>> thinks might be titles and put a page break before them so that I can
>> simply
>> look through the document and spot check to see if the lines are really
>> titles rather than read the whole thousand pages and find them all by
>> hand.
>>
>> Thanks.
>>
>> Jim
>>
>> Jim Homme,
>> Usability Services,
>> Phone: 412-544-1810. Skype: jim.homme
>> Internal recipients,  Read my accessibility blog. Discuss accessibility
>> here. Accessibility Wiki: Breaking news and accessibility advice
>>
>>
>> -----Original Message-----
>> From: programmingblind-bounce@xxxxxxxxxxxxx [mailto:
>> programmingblind-bounce@xxxxxxxxxxxxx] On Behalf Of Jim Bauer
>> Sent: Friday, August 06, 2010 1:25 PM
>> To: programmingblind@xxxxxxxxxxxxx
>> Subject: Re: Regular Expression Question: How To Search For Section Titles
>>
>> It does, just not inside a character class. If you wanted to match
>> something from one of several character classes using `|', you would do
>> something
>> like:
>> ----------
>> [a-z]|[A-Z]|[...]
>> ----------
>> But you can just spell out everything you want to match in a single
>> character class, so I don't see that as particularly useful.
>>
>> On Fri, 6 Aug 2010 12:48:12 -0400, Homme, James wrote:
>> > Hi,
>> > I'm misusing the vertical bar. I thought it created an or condition.
>> >
>> > Jim
>> >
>> > Jim Homme,
>> > Usability Services,
>> > Phone: 412-544-1810. Skype: jim.homme
>> > Internal recipients,  Read my accessibility blog. Discuss accessibility
>> here. Accessibility Wiki: Breaking news and accessibility advice
>> >
>> >
>> > -----Original Message-----
>> > From: programmingblind-bounce@xxxxxxxxxxxxx [mailto:
>> programmingblind-bounce@xxxxxxxxxxxxx] On Behalf Of Jim Bauer
>> > Sent: Friday, August 06, 2010 10:36 AM
>> > To: programmingblind@xxxxxxxxxxxxx
>> > Subject: Re: Regular Expression Question: How To Search For Section
>> Titles
>> >
>> > You're including `|' in your last character class, not matching
>> > uppercase
>> letters or lowercase letters or digits. This means something like `This is
>> a
>> > test|' will match, which, of course, is fine if that's what you're
>> intending. :)
>> >
>> > ----------
>> > ^[A-Z].+[A-Za-z0-9]$
>> > ----------
>> >
>> > On Fri, 6 Aug 2010 09:42:55 -0400, Homme, James wrote:
>> > > Hi,
>> > > How would you construct a regular expression that looks for the first
>> letter of any line in upper case followed by the rest of the line as long
>> as
>> it ends with a letter or number?  Would it be something like this?
>> > > ^[A-Z].*[A-Z|a-z|1-9]$
>> > >
>> > > Thanks.
>> > >
>> > > Jim
>> > >
>> > > Jim Homme,
>> > > Usability Services,
>> > > Phone: 412-544-1810. Skype: jim.homme
>> > > Internal recipients,  Read my accessibility blog<
>> http://mysites.highmark.com/personal/lidikki/Blog/default.aspx>. Discuss
>> accessibility here<
>> http://collaborate.highmark.com/COP/technical/accessibility/default.aspx>.
>> Accessibility Wiki: Breaking news and accessibility advice<
>> http://collaborate.highmark.com/COP/technical/accessibility/Accessibility%20Wiki/Forms/AllPages.aspx
>> >
>> > >
>> > >
>> > > ________________________________
>> > > This e-mail and any attachments to it are confidential and are
>> > > intended
>> solely for use of the individual or entity to whom they are addressed. If
>> you have received this e-mail in error, please notify the sender
>> immediately
>> and then delete it. If you are not the intended recipient, you must not
>> keep, use, disclose, copy or distribute this e-mail without the author's
>> prior permission. The views expressed in this e-mail message do not
>> necessarily represent the views of Highmark Inc., its subsidiaries, or
>> affiliates.
>> >
>> > __________
>> > View the list's information and change your settings at
>> > //www.freelists.org/list/programmingblind
>> >
>> > __________
>> > View the list's information and change your settings at
>> > //www.freelists.org/list/programmingblind
>>
>> __________
>> View the list's information and change your settings at
>> //www.freelists.org/list/programmingblind
>>
>> __________
>> View the list's information and change your settings at
>> //www.freelists.org/list/programmingblind
>>
>>
>
>
> --
> Kerneels Roos
> Cell/SMS: +27 (0)82 309 1998
> Skype: cornelis.roos
>
> The early bird may get the worm, but the second mouse gets the cheese!
>
__________
View the list's information and change your settings at 
//www.freelists.org/list/programmingblind

__________
View the list's information and change your settings at
//www.freelists.org/list/programmingblind

Other related posts: