[yunqa.de] Re: TDIHtmlParser.StartPos returns wrong value if a copyright symbol ((c)) is included

  • From: Delphi Inspiration <delphi@xxxxxxxx>
  • To: yunqa@xxxxxxxxxxxxx
  • Date: Fri, 27 Apr 2012 12:09:18 +0200

I suspect this is a character conversion issue so I need your full
parsing project and exact HTML document.

I have received your HTML document inlined to your message. Because this
may convert the character encoding, please re-send both project and HTML
as attachments (preferably zipped).

Please also mention in the project where in the parsing you access the


On 27.04.2012 09:34, Edwin Yip wrote:

> I'd like to report a bug - TDIHtmlParser.StartPos returns wrong value if
> a copyright symbol (©) is included, see comments in the test file
> included inline ( I assume this list doesn't accept email attachments).
> My TDiHtmlParser object setup:
>   FParser.ReadMethods := Read_UTF_16_LE;
>   FParser.SetSourceBufferAsStr(aSrcCode); //aSrcCode is Unicode string
> in D2010
>   FWriter.Writer.WriteMethods := Write_UTF_16_LE;
>   FParser.SetFullParser;
>   FWriter.SetAllFilters(fiHide);
> The following is a test html file:
> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
> "http://www.w3.org/TR/html4/loose.dtd";>
> <html>
> <head>
> </head>
> <!-- this is a test file demostrating an issue.
> © once a copyright symbol is included anywhare in the html file, the
> TDIHtmlParser.StartPos (and 
> maybe other similar properties I haven't tested) will return wrong
> value. If the correct value is 100 and you have one ©, 
> the returned value will be 98; if you have two, it'll be 96, and vice
> versa. -->
> <body>
> <p>this is a line</P>
> </body>
> </html>
Delphi Inspiration mailing list

Other related posts: