RE: Regular Expression Question: How To Search For Section Titles

  • From: Jamal Mazrui <empower@xxxxxxxxx>
  • To: programmingblind@xxxxxxxxxxxxx
  • Date: Mon, 9 Aug 2010 13:45:32 -0400 (EDT)

Try this for the search:

\n\n+([A-Z].*?[a-zA-Z0-9])\n+

and this for the replacement:

\n----------\n\f\n$1\n\n

Jamal

-----Original Message-----
From: programmingblind-bounce@xxxxxxxxxxxxx [mailto:programmingblind-bounce@xxxxxxxxxxxxx] On Behalf Of Homme, James
Sent: Monday, August 09, 2010 7:15 AM
To: programmingblind@xxxxxxxxxxxxx
Subject: RE: Regular Expression Question: How To Search For Section Titles

Hi,
I've narrowed down what I want to match a little better now. I'd love to not have a block for this stuff. Sorry people. Anyway, here goes.

The first thing I want to match is two new lines.
Next is a line of text that starts with a capital letter.
The line can have anything else on it.
The line should end with a letter or a number.

Then I want to replace it with what I have found, but I want to put the EdSharp section break string before it and an extra new line after it.

A section break in EdSharp looks like this. \n----------\n\f\n

Thanks.

Jim



Jim Homme,
Usability Services,
Phone: 412-544-1810. Skype: jim.homme
Internal recipients, Read my accessibility blog. Discuss accessibility here. Accessibility Wiki: Breaking news and accessibility advice


-----Original Message-----
From: programmingblind-bounce@xxxxxxxxxxxxx [mailto:programmingblind-bounce@xxxxxxxxxxxxx] On Behalf Of Dave
Sent: Saturday, August 07, 2010 3:17 PM
To: programmingblind@xxxxxxxxxxxxx
Subject: Re: Regular Expression Question: How To Search For Section Titles

Btw, your reg ex depending on which engine you're using, could just be ^[A-Z].$

(the period in your reg ex could be interpreted to match any character besides a new line). Thus, this reg ex just matches a line that starts with an upper case character followed by any number of characters.

If you instead wanted only alphanumeric characters, then you would probably try the pattern:
^[A-Z][A-Za-z0-9]*$

(the asterik follows the group and is called a quantifier. The asterik simply says that the pattern should have 0 or more of this set). I'm not sure what the ".*" you had in your pattern would have resulted in.

It would then match:
A1234

Ca1a

or

Z

Hth.

On 8/7/10, Kerneels Roos <kerneels@xxxxxxxxx> wrote:
Going out on a limb here regarding exact command line options, but you
could use the 'sed` command, the stream editor to do this at the command
prompt:




$ sed s/(^[A-Z].*[A-Za-z0-9]$)/\f\1/g in.txt > out.txt

or:

$ sed s/\(^[A-Z].*[A-Za-z0-9]$\)/\f\1/g in.txt > out.txt

if `(' needs to be escaped. Not sure if '\f` would insert page breaks
either
-- might have to access the direct ASCII value, but anyways.

The `s' in the sed regular expression pattern instructs sed that you
want to do a substitution.

On Fri, Aug 6, 2010 at 7:48 PM, Homme, James
<james.homme@xxxxxxxxxxxx>wrote:

Hi,
Maybe EdSharp uses .Net regular expressions, and maybe they are
different from Perl regular expressions. I was trying to use $1 to
capture and replace, but it was literally inserting $1. I was trying
to put \f before $1 in the replacement expression. I'm attempting to
find what it thinks might be titles and put a page break before them
so that I can simply look through the document and spot check to see
if the lines are really titles rather than read the whole thousand
pages and find them all by hand.

Thanks.

Jim

Jim Homme,
Usability Services,
Phone: 412-544-1810. Skype: jim.homme Internal recipients,  Read my
accessibility blog. Discuss accessibility here. Accessibility Wiki:
Breaking news and accessibility advice


-----Original Message-----
From: programmingblind-bounce@xxxxxxxxxxxxx [mailto:
programmingblind-bounce@xxxxxxxxxxxxx] On Behalf Of Jim Bauer
Sent: Friday, August 06, 2010 1:25 PM
To: programmingblind@xxxxxxxxxxxxx
Subject: Re: Regular Expression Question: How To Search For Section
Titles

It does, just not inside a character class. If you wanted to match
something from one of several character classes using `|', you would
do something
like:
----------
[a-z]|[A-Z]|[...]
----------
But you can just spell out everything you want to match in a single
character class, so I don't see that as particularly useful.

On Fri, 6 Aug 2010 12:48:12 -0400, Homme, James wrote:
> Hi,
> I'm misusing the vertical bar. I thought it created an or condition.
>
> Jim
>
> Jim Homme,
> Usability Services,
> Phone: 412-544-1810. Skype: jim.homme Internal recipients,  Read my
> accessibility blog. Discuss accessibility
here. Accessibility Wiki: Breaking news and accessibility advice
>
>
> -----Original Message-----
> From: programmingblind-bounce@xxxxxxxxxxxxx [mailto:
programmingblind-bounce@xxxxxxxxxxxxx] On Behalf Of Jim Bauer
> Sent: Friday, August 06, 2010 10:36 AM
> To: programmingblind@xxxxxxxxxxxxx
> Subject: Re: Regular Expression Question: How To Search For Section
Titles
>
> You're including `|' in your last character class, not matching
> uppercase
letters or lowercase letters or digits. This means something like
`This is a
> test|' will match, which, of course, is fine if that's what you're
intending. :)
>
> ----------
> ^[A-Z].+[A-Za-z0-9]$
> ----------
>
> On Fri, 6 Aug 2010 09:42:55 -0400, Homme, James wrote:
> > Hi,
> > How would you construct a regular expression that looks for the
> > first
letter of any line in upper case followed by the rest of the line as
long as it ends with a letter or number?  Would it be something like
this?
> > ^[A-Z].*[A-Z|a-z|1-9]$
> >
> > Thanks.
> >
> > Jim
> >
> > Jim Homme,
> > Usability Services,
> > Phone: 412-544-1810. Skype: jim.homme Internal recipients,  Read
> > my accessibility blog<
http://mysites.highmark.com/personal/lidikki/Blog/default.aspx>.
Discuss accessibility here<

http://collaborate.highmark.com/COP/technical/accessibility/default.aspx>.
Accessibility Wiki: Breaking news and accessibility advice<
http://collaborate.highmark.com/COP/technical/accessibility/Accessibi
lity%20Wiki/Forms/AllPages.aspx
>
> >
> >
> > ________________________________
> > This e-mail and any attachments to it are confidential and are
> > intended
solely for use of the individual or entity to whom they are
addressed. If you have received this e-mail in error, please notify
the sender immediately and then delete it. If you are not the
intended recipient, you must not keep, use, disclose, copy or
distribute this e-mail without the author's prior permission. The
views expressed in this e-mail message do not necessarily represent
the views of Highmark Inc., its subsidiaries, or affiliates.
>
> __________
> View the list's information and change your settings at
> //www.freelists.org/list/programmingblind
>
> __________
> View the list's information and change your settings at
> //www.freelists.org/list/programmingblind

__________
View the list's information and change your settings at
//www.freelists.org/list/programmingblind

__________
View the list's information and change your settings at
//www.freelists.org/list/programmingblind




--
Kerneels Roos
Cell/SMS: +27 (0)82 309 1998
Skype: cornelis.roos

The early bird may get the worm, but the second mouse gets the cheese!

__________
View the list's information and change your settings at //www.freelists.org/list/programmingblind

__________
View the list's information and change your settings at //www.freelists.org/list/programmingblind


__________
View the list's information and change your settings at //www.freelists.org/list/programmingblind

Other related posts: