[yunqa.de] Re: Access violation calling DIPerlRegEx.Match

  • From: Jim Bretti <jim@xxxxxxxxxx>
  • To: yunqa@xxxxxxxxxxxxx
  • Date: Fri, 28 May 2010 12:24:09 -0400

Hi Ralf, thanks for the update.

On the utf8 recommendation, are you saying I need to use coTUF8 if my subject/match string contains ordinal values > 127?  I'm using the unicode / utf8 options only when necessary since I seem to be getting better performance when I don't go through the utf8 encoding and character counting.

Jim


Delphi Inspiration wrote:
At 22:09 27.05.2010, Delphi Inspiration wrote:

  
At 02:22 27.05.2010, Jim Bretti wrote:

    
I'm getting an access violation from DIPerlRegEx.Match when using the following subject / match pattern:
      
I could reduce this to the following:

Subject: für
Pattern: \p{Zs}*\R

The problem shows for the Umlaut letter 'ü' (but not 'ä') and requires both '/p{Zs}*' and '\R'. At a quick glance this looks like a PCRE problem. I will investigate further tomorrow.
    
Code analysis revealed that this is indeed a problem in PCRE. By mistake, it always tests Unicode Character Properties (the '\p{Zs}' part in your pattern) against UTF-8, even if coUtf8 is not set. The 'ü' Umlaut this therefore interpreted as the beginning of an UTF-8 sequence and results in an invalid character which finally leads to the access violation.

The workaround I suggested yesterday is safe and the recommended way to handle Umlaut characters with DIRegEx: Set the coUtf8 compile time option and encode both pattern and subject as UTF-8 before passing them to TDIRegEx methods.

I have forwarded the problem to the PCRE mailing list and will update DIRegEx when a fix becomes available.

Ralf 

_______________________________________________
Delphi Inspiration mailing list
yunqa@xxxxxxxxxxxxx
//www.freelists.org/list/yunqa





  
_______________________________________________ Delphi Inspiration mailing list yunqa@xxxxxxxxxxxxx //www.freelists.org/list/yunqa

Other related posts: