[liblouis-liblouisxml] Re: Regex question

  • From: Ken Perry <kperry@xxxxxxx>
  • To: "liblouis-liblouisxml@xxxxxxxxxxxxx" <liblouis-liblouisxml@xxxxxxxxxxxxx>
  • Date: Sun, 27 Oct 2013 01:06:48 +0000

Remember I have not been talking about emphasis when I talk about regex there 
are other rules that I am thinking about.  For example knowing that if a word 
ends in 's the word is the match before it or things like that.  You will still 
need an action to work on the text you find with the regex.  The thing that the 
regex gives you though is matching groupings that can be manipulated in the 
function or action.

I think with your emphasis problem I would rather write a grammar to handle 
that and let a parser generator handle it.  If I had to write it by hand I 
would probably use some kind of stack base recursion to move through the text. 
And depending if I am in a block of empisised text do one set of substitutions 
and if not do the other.  Same goes for the capital letters I would push down 
each character after I hit the first capital and keep a count.  When I hit a 
terminator I could return the whole stack with the correct tags and continue 
on.  



-----Original Message-----
From: liblouis-liblouisxml-bounce@xxxxxxxxxxxxx 
[mailto:liblouis-liblouisxml-bounce@xxxxxxxxxxxxx] On Behalf Of Susan Jolly
Sent: Saturday, October 26, 2013 4:55 PM
To: liblouis-liblouisxml@xxxxxxxxxxxxx
Subject: [liblouis-liblouisxml] Regex question

I'm very familiar with regex's and have made significant use of the powerful 
regex facility in Java.  I understand how a regex can be used to identify the 
print item or items to which a braille translation action applies.  But I don't 
off the top of my head see how to use a regex to represent the corresponding 
action.

Perhaps someone can give me an explicit example that would handle the EBAE 
rules that apply to the use of contractions and indicators in what EBAE refers 
to as partially emphasized words. These are words that are either mixed case 
(other than titlecase) or words where special typeforms apply to only some of 
the letters.

Use of contractions. Contractions may not be used in any partially emphasized 
word.

Use of capitalization indicators. (Note that more than one of these rules can 
apply to a single word.) (1)A sequence of two more trailing capital letters is 
to be preceded by the double capital sign.
(2)Each sequence of two or more non-trailing capital letters is to be preceded 
by the double capital sign and followed by the termination sign.
(3)Each individual capital letter is to be preceded by the single capital sign.

Use of typeform indicators.
(1)One or more trailing emphasized letters is to be preceded by the appropriate 
typeform indicator.
(2)One or more non-trailing emphasized letters is to be preceded by the 
appropriate typeform indicator and followed by the termination sign.

(Note that EBAE uses the same typeform indicators for a single symbol or a word 
whereas UEB and some other codes have different indicators for a single symbol 
and for a sequence.)

Susan 

For a description of the software, to download it and links to project pages go 
to http://www.abilitiessoft.com
For a description of the software, to download it and links to
project pages go to http://www.abilitiessoft.com

Other related posts: