[yunqa.de] Re: TDIHtmlParser.StartPos returns wrong value if a copyright symbol ((c)) is included

  • From: Edwin Yip <edwin.yip@xxxxxxxxxxxxxxxxxx>
  • To: yunqa@xxxxxxxxxxxxx
  • Date: Fri, 27 Apr 2012 22:57:42 +0800

My project is too complex to send, I'll make a sample project when I find
some time and if you can reproduce  the problem with the info I provided
(more below).

I further found that, initially StartPos is correct, but once the parser
met any  non-ASCII character such as Chinese will cause the StartPos report
wrong value, it looks like the StartPos property don't count the non-ASCII
character at all.

So I think maybe you can just try parsing a unicodeString that contains the ©
copyright symbol, since I'm quite sure the content I pass to
SetSourceBufferAsStr
is correctly encoded into UnicodeString (i.e. string in D2010).

And since the string returned by TDIHtmlWriterPlugin.Writer.DataToStr is
correctly encoded, I'm not sure if my code is wrong.


On Fri, Apr 27, 2012 at 6:09 PM, Delphi Inspiration <delphi@xxxxxxxx> wrote:

> I suspect this is a character conversion issue so I need your full
> parsing project and exact HTML document.
>
> I have received your HTML document inlined to your message. Because this
> may convert the character encoding, please re-send both project and HTML
> as attachments (preferably zipped).
>
> Please also mention in the project where in the parsing you access the
> TDIHtmlParser.StartPos.
>
> Ralf
>
> On 27.04.2012 09:34, Edwin Yip wrote:
>
> > I'd like to report a bug - TDIHtmlParser.StartPos returns wrong value if
> > a copyright symbol (©) is included, see comments in the test file
> > included inline ( I assume this list doesn't accept email attachments).
> >
> > My TDiHtmlParser object setup:
> >   FParser.ReadMethods := Read_UTF_16_LE;
> >   FParser.SetSourceBufferAsStr(aSrcCode); //aSrcCode is Unicode string
> > in D2010
> >   FWriter.Writer.WriteMethods := Write_UTF_16_LE;
> >   FParser.SetFullParser;
> >   FWriter.SetAllFilters(fiHide);
> >
> > The following is a test html file:
> >
> > <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
> > "http://www.w3.org/TR/html4/loose.dtd";>
> > <html>
> > <head>
> > </head>
> > <!-- this is a test file demostrating an issue.
> >
> > © once a copyright symbol is included anywhare in the html file, the
> > TDIHtmlParser.StartPos (and
> > maybe other similar properties I haven't tested) will return wrong
> > value. If the correct value is 100 and you have one ©,
> > the returned value will be 98; if you have two, it'll be 96, and vice
> > versa. -->
> >
> > <body>
> > <p>this is a line</P>
> > </body>
> > </html>
> _______________________________________________
> Delphi Inspiration mailing list
> yunqa@xxxxxxxxxxxxx
> //www.freelists.org/list/yunqa
>
>
>
>


-- 
Best Regards,
Edwin Yip

Mind Mapping is as Effortless as Typing
http://www.InnovationGear.com

Other related posts: