[yunqa.de] Re: Access violation calling DIPerlRegEx.Match

  • From: Delphi Inspiration <delphi@xxxxxxxx>
  • To: yunqa@xxxxxxxxxxxxx
  • Date: Fri, 28 May 2010 17:21:22 +0200

At 22:09 27.05.2010, Delphi Inspiration wrote:

>At 02:22 27.05.2010, Jim Bretti wrote:
>
>>I'm getting an access violation from DIPerlRegEx.Match when using the 
>>following subject / match pattern:
>
>I could reduce this to the following:
>
>Subject: für
>Pattern: \p{Zs}*\R
>
>The problem shows for the Umlaut letter 'ü' (but not 'ä') and requires both 
>'/p{Zs}*' and '\R'. At a quick glance this looks like a PCRE problem. I will 
>investigate further tomorrow.

Code analysis revealed that this is indeed a problem in PCRE. By mistake, it 
always tests Unicode Character Properties (the '\p{Zs}' part in your pattern) 
against UTF-8, even if coUtf8 is not set. The 'ü' Umlaut this therefore 
interpreted as the beginning of an UTF-8 sequence and results in an invalid 
character which finally leads to the access violation.

The workaround I suggested yesterday is safe and the recommended way to handle 
Umlaut characters with DIRegEx: Set the coUtf8 compile time option and encode 
both pattern and subject as UTF-8 before passing them to TDIRegEx methods.

I have forwarded the problem to the PCRE mailing list and will update DIRegEx 
when a fix becomes available.

Ralf 

_______________________________________________
Delphi Inspiration mailing list
yunqa@xxxxxxxxxxxxx
//www.freelists.org/list/yunqa



Other related posts: