[yunqa.de] Re: DITidy to process www.163.com

  • From: Bear Xu <bear.xy@xxxxxxxxx>
  • To: yunqa@xxxxxxxxxxxxx
  • Date: Thu, 20 Aug 2009 13:18:35 +0800

Dear Ralf ,

1. I just copy the source code of the page in IE and paste it to Delphi
TMemo, and use it to call Tidy functions
(to process Unicode html you provided to me last time)

2. I run your sourcecode, the same result:

all of the *end* tag is *wrong*!! ==>


<\/table>
<\/center><\/div>
<\/div>

</div> == became ==><*\*/div>
do not know why ?

3. when I pass unicode html source code to Tidy, will it to check the Meta
charset setings  in head section?
I think it should not check that, or it is processing a html file.

4. I continue test!
   Remove sourcecode piece by piece, finally I found :

If there is no head, the all output will be empty

and it is caused by "</tbody></form></table>"

for example:
==================================
<div> Test Content
</tbody>*</form>*</table>
</div>
 ==================================
tidy it , the result is empty
if I removed *</form>,* it will output "*<**div>Test Content</div>"*

please have a check for Tidy SourceCode, thanks

thanks,

Bear



On Mon, Aug 17, 2009 at 7:30 PM, Delphi Inspiration <delphi@xxxxxxxx> wrote:

> At 12:34 15.08.2009, Bear Xu wrote:
>
> >I have sent the zip file to delphi@xxxxxxxxx
>
> Thank you, I have received the file.
>
> The HTML file is encoded in GB2312. This is specified in <meta
> http-equiv="Content-Type" content="text/html; charset=gb2312" />.
>
> The code snippet you posted does not reveal how you load the file. It only
> tells that you are passing it as UnicodeString. In doing so, you must have
> converted it from GB2312 to UTF16-LE. Unfortunately, your code snippet does
> not reveal how you do so. But however you do, the DB2312 charset
> specification will no longer match your HTML, which is now in UTF16-LE. This
> will most likely lead to severe problems and errors.
>
> I suggest you use the tidyParseFile() function instead of tidyParseBuffer()
> to avoid string conversion problems. Please see the attached project for an
> example. It seems to work fine after a quick inspection, but the document is
> too lengthy for me to come up with detailed analysis.
>
> Ralf

Other related posts: