[yunqa.de] TDIHtmlParser.StartPos returns wrong value if a copyright symbol ((c)) is included

  • From: Edwin Yip <edwin.yip@xxxxxxxxxxxxxxxxxx>
  • To: yunqa@xxxxxxxxxxxxx
  • Date: Fri, 27 Apr 2012 15:34:48 +0800

Hi Ralf,

I'd like to report a bug - TDIHtmlParser.StartPos returns wrong value if a
copyright symbol (©) is included, see comments in the test file included
inline ( I assume this list doesn't accept email attachments).

My TDiHtmlParser object setup:
  FParser.ReadMethods := Read_UTF_16_LE;
  FParser.SetSourceBufferAsStr(aSrcCode); //aSrcCode is Unicode string in
D2010
  FWriter.Writer.WriteMethods := Write_UTF_16_LE;
  FParser.SetFullParser;
  FWriter.SetAllFilters(fiHide);

The following is a test html file:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "
http://www.w3.org/TR/html4/loose.dtd";>
<html>
<head>
</head>
<!-- this is a test file demostrating an issue.

© once a copyright symbol is included anywhare in the html file, the
TDIHtmlParser.StartPos (and
maybe other similar properties I haven't tested) will return wrong value.
If the correct value is 100 and you have one ©,
the returned value will be 98; if you have two, it'll be 96, and vice
versa. -->

<body>
<p>this is a line</P>
 </body>
</html>


-- 
Best Regards,
Edwin Yip

Mind Mapping is as Effortless as Typing
http://www.InnovationGear.com

Other related posts: