[yunqa.de] DIRegEx - translating utf8 match info to utf16

  • From: "Jim Bretti" <jim@xxxxxxxxxx>
  • To: <yunqa@xxxxxxxxxxxxx>
  • Date: Wed, 14 Nov 2007 15:58:33 -0500

Hi,

I have a question on how to handle matched character position / matched
length with Unicode strings.

My subject string and match pattern start off as UFT16, and I convert them
to UTF8 with StrEncodeUTF8.  For example, something like this to search for
a single Unicode character:

  RE := TDIPerlRegEx.Create(nil);
  RE.CompileOptions := RE.CompileOptions + [coUtf8];
  RE.SetSubjectStr( StrEncodeUtf8(SourceStr) );
  RE.MatchPattern := ( StrEncodeUtf8('\x{4E94}') );
  If RE.Match(0) > 0 then
     ....

I can display the matched string with StrDecodeUtf8(re.MatchedStr), but I'm
having trouble with RE.MatchedStrLength and RE.MatchStrFirstCharPos.  These
values are relative to the UTF8 encoded source ... is there any way to
translate the values so they are relative to the original utf16 source?

Thanks
Jim


_______________________________________________
Delphi Inspiration mailing list
yunqa@xxxxxxxxxxxxx
//www.freelists.org/list/yunqa



Other related posts: