At 18:24 28.05.2010, Jim Bretti wrote: >On the utf8 recommendation, are you saying I need to use coTUF8 if my >subject/match string contains ordinal values > 127? Yes, UTF-8 is the recommended way for DIRegEx to handle characters outside the US-ASCII codepage, that is any character code point greater than 127. In non-UTF-8 mode, DIRegEx interprets characters according to the ISO-8859-1 (Latin-1) codepage, which is the first 255 Unicode characters. You can use the set_locale() function to a single-byte locale supported by the Windows target operating system. This rebuilds the internal character comparison and character type tables, but does not affect the Unicode Character Properties (UCP), which you are using in your pattern. >I'm using the unicode / utf8 options only when necessary since I seem to be >getting better performance when I don't go through the utf8 encoding and >character counting. Binary (Non-UTF8) matching is obviously faster since it can work on a much smaller character range. But then, simple string comparisons are usually faster than regular expressions. This is a little like comparing apples and pears. For text data, I would not go the extra mile of supporting separate UTF-8 and non-UTF8 versions. It addd extra complexity to the code, with just a minor performance benefit, given todays processor speeds. Most texts are Unicode these days already, and those which are not, are likely to be so in the future. For binary data, non-UTF8 matching is obviously the best choice. But then one would not expect to apply Unicode Character Properties to binary data, after all. ;-) Ralf _______________________________________________ Delphi Inspiration mailing list yunqa@xxxxxxxxxxxxx //www.freelists.org/list/yunqa