[yunqa.de] Re: DIRegEx ansi_mbtowc, Big5

  • From: Alexey Torgashin <atorg@xxxxx>
  • To: yunqa@xxxxxxxxxxxxx
  • Date: Thu, 18 Oct 2007 16:57:38 +0400

> >But you pass there the SizeOf(c) which is **2**.
> You are right, it should be **1** in both ansi_mbtowc and oem_mbtowc. Please 
> correct your version of DIRegEx_SearchStream.pas.

OK.
Thanks, I hope DIRegEx closed source doesn't have this too.


> It would certainly be faster to apply MultiByteToWideChar to the whoe string. 
> On the contrary, TDIRegExSearchStream_Enc is all about not loading entire 
> strings (huge files) into memory at once but in small blocks only. Since we 
> do not know where blocks overlap character encoding boundaries, the decoding 
> function reads exactly one character at a time.

OK, understand.

The next question to this:
what if I use the codepage, which is double-byte, such as Chinese Big5;
the single byte passed to MultiByteToWideChar will give wrong result. You'll 
get wrong converted result with such xxx_mbtowc. Is this right?


Components for Delphi:
http://atorg.net.ru
_______________________________________________
Delphi Inspiration mailing list
yunqa@xxxxxxxxxxxxx
//www.freelists.org/list/yunqa



Other related posts: