Re: Regular Expression Question: How To Search For Section Titles

  • From: Kerneels Roos <kerneels@xxxxxxxxx>
  • To: programmingblind@xxxxxxxxxxxxx
  • Date: Sat, 7 Aug 2010 11:39:09 +0200

Going out on a limb here regarding exact command line options, but you could
use the 'sed` command, the stream editor to do this at the command prompt:




$ sed s/(^[A-Z].*[A-Za-z0-9]$)/\f\1/g in.txt > out.txt

or:

$ sed s/\(^[A-Z].*[A-Za-z0-9]$\)/\f\1/g in.txt > out.txt

if `(' needs to be escaped. Not sure if '\f` would insert page breaks either
-- might have to access the direct ASCII value, but anyways.

The `s' in the sed regular expression pattern instructs sed that you want to
do a substitution.

On Fri, Aug 6, 2010 at 7:48 PM, Homme, James <james.homme@xxxxxxxxxxxx>wrote:

> Hi,
> Maybe EdSharp uses .Net regular expressions, and maybe they are different
> from Perl regular expressions. I was trying to use $1 to capture and
> replace, but it was literally inserting $1. I was trying to put
> \f before $1 in the replacement expression. I'm attempting to find what it
> thinks might be titles and put a page break before them so that I can simply
> look through the document and spot check to see if the lines are really
> titles rather than read the whole thousand pages and find them all by hand.
>
> Thanks.
>
> Jim
>
> Jim Homme,
> Usability Services,
> Phone: 412-544-1810. Skype: jim.homme
> Internal recipients,  Read my accessibility blog. Discuss accessibility
> here. Accessibility Wiki: Breaking news and accessibility advice
>
>
> -----Original Message-----
> From: programmingblind-bounce@xxxxxxxxxxxxx [mailto:
> programmingblind-bounce@xxxxxxxxxxxxx] On Behalf Of Jim Bauer
> Sent: Friday, August 06, 2010 1:25 PM
> To: programmingblind@xxxxxxxxxxxxx
> Subject: Re: Regular Expression Question: How To Search For Section Titles
>
> It does, just not inside a character class. If you wanted to match
> something from one of several character classes using `|', you would do
> something
> like:
> ----------
> [a-z]|[A-Z]|[...]
> ----------
> But you can just spell out everything you want to match in a single
> character class, so I don't see that as particularly useful.
>
> On Fri, 6 Aug 2010 12:48:12 -0400, Homme, James wrote:
> > Hi,
> > I'm misusing the vertical bar. I thought it created an or condition.
> >
> > Jim
> >
> > Jim Homme,
> > Usability Services,
> > Phone: 412-544-1810. Skype: jim.homme
> > Internal recipients,  Read my accessibility blog. Discuss accessibility
> here. Accessibility Wiki: Breaking news and accessibility advice
> >
> >
> > -----Original Message-----
> > From: programmingblind-bounce@xxxxxxxxxxxxx [mailto:
> programmingblind-bounce@xxxxxxxxxxxxx] On Behalf Of Jim Bauer
> > Sent: Friday, August 06, 2010 10:36 AM
> > To: programmingblind@xxxxxxxxxxxxx
> > Subject: Re: Regular Expression Question: How To Search For Section
> Titles
> >
> > You're including `|' in your last character class, not matching uppercase
> letters or lowercase letters or digits. This means something like `This is a
> > test|' will match, which, of course, is fine if that's what you're
> intending. :)
> >
> > ----------
> > ^[A-Z].+[A-Za-z0-9]$
> > ----------
> >
> > On Fri, 6 Aug 2010 09:42:55 -0400, Homme, James wrote:
> > > Hi,
> > > How would you construct a regular expression that looks for the first
> letter of any line in upper case followed by the rest of the line as long as
> it ends with a letter or number?  Would it be something like this?
> > > ^[A-Z].*[A-Z|a-z|1-9]$
> > >
> > > Thanks.
> > >
> > > Jim
> > >
> > > Jim Homme,
> > > Usability Services,
> > > Phone: 412-544-1810. Skype: jim.homme
> > > Internal recipients,  Read my accessibility blog<
> http://mysites.highmark.com/personal/lidikki/Blog/default.aspx>. Discuss
> accessibility here<
> http://collaborate.highmark.com/COP/technical/accessibility/default.aspx>.
> Accessibility Wiki: Breaking news and accessibility advice<
> http://collaborate.highmark.com/COP/technical/accessibility/Accessibility%20Wiki/Forms/AllPages.aspx
> >
> > >
> > >
> > > ________________________________
> > > This e-mail and any attachments to it are confidential and are intended
> solely for use of the individual or entity to whom they are addressed. If
> you have received this e-mail in error, please notify the sender immediately
> and then delete it. If you are not the intended recipient, you must not
> keep, use, disclose, copy or distribute this e-mail without the author's
> prior permission. The views expressed in this e-mail message do not
> necessarily represent the views of Highmark Inc., its subsidiaries, or
> affiliates.
> >
> > __________
> > View the list's information and change your settings at
> > //www.freelists.org/list/programmingblind
> >
> > __________
> > View the list's information and change your settings at
> > //www.freelists.org/list/programmingblind
>
> __________
> View the list's information and change your settings at
> //www.freelists.org/list/programmingblind
>
> __________
> View the list's information and change your settings at
> //www.freelists.org/list/programmingblind
>
>


-- 
Kerneels Roos
Cell/SMS: +27 (0)82 309 1998
Skype: cornelis.roos

The early bird may get the worm, but the second mouse gets the cheese!

Other related posts: