[yunqa.de] Re: DIRegEx ansi_mbtowc, Big5

  • From: Delphi Inspiration <delphi@xxxxxxxx>
  • To: yunqa@xxxxxxxxxxxxx
  • Date: Thu, 18 Oct 2007 17:07:58 +0200

>Please correct your version of DIRegEx_SearchStream.pas.
>
>Thanks, I hope DIRegEx closed source doesn't have this too.

No, is is only in DIRegEx_SearchStream.pas.

>> It would certainly be faster to apply MultiByteToWideChar to the whoe 
>> string. On the contrary, TDIRegExSearchStream_Enc is all about not loading 
>> entire strings (huge files) into memory at once but in small blocks only. 
>> Since we do not know where blocks overlap character encoding boundaries, the 
>> decoding function reads exactly one character at a time.
>
>OK, understand.
>
>The next question to this: what if I use the codepage, which is double-byte, 
>such as Chinese Big5; the single byte passed to MultiByteToWideChar will give 
>wrong result.

No. Well, on the first iteration of the repeat loop MultiByteToWideChar might 
fail with just one byte of input. But the function will call it again with more 
input. At first sight this might seem more complicated then necessary, but it 
is unfortunately the only way to determine the number of bytes consumed by 
MultiByteToWideChar.

> You'll get wrong converted result with such xxx_mbtowc. Is this right?

No. If the buffer is not large enough, xxx_mbtowc returns RET_TOOSMALL and 
TDIRegExSearchStream_Enc will try to read more input from the stream and call 
xxx_mbtowc again.

Ralf 

_______________________________________________
Delphi Inspiration mailing list
yunqa@xxxxxxxxxxxxx
//www.freelists.org/list/yunqa



Other related posts: