[yunqa.de] Re: How to TIDY HTML with unicode via DITidy?

  • From: Bear Xu <bear.xy@xxxxxxxxx>
  • To: yunqa@xxxxxxxxxxxxx
  • Date: Sun, 8 Feb 2009 21:28:33 +0800

Dear Ralf,

Thank you for your reply.
It works.

I want to by DITidy without source first(Euro 60), then I want to upgrade to
DITidy with source by paying you Euro 120?
Thanks,

Bear



On Sun, Feb 8, 2009 at 7:56 PM, Delphi Inspiration <delphi@xxxxxxxx> wrote:

> Bear Xu wrote:
>
> >1.
> >I do not know the encode type of the the html file(may be windows 1251 or
> gb2312, or utf8 or others), e.g. used in a crawler,
> >How to use DiTidy to parse the html file with unicode? it is possible?
>
> DITidy readily supports these encodings:
>
>  raw
>  ascii
>  latin0
>  latin1
>  utf8
>  iso2022
>  mac
>  win1252
>  ibm858
>  utf16le
>  utf16be
>  utf16
>  big5
>  shiftjis
>
> DITidy also detects encoding markers present in HTML documents. If such a
> marker is missing, you can set the encoding manually before the parsing.
>
>  tidySetCharEncoding(Doc, PAnsiChar(cboEncoding.Text));
>
> See DITidy_Analyse.dpr for an example implementation.
>
> Web servers often return the character encodings along with their response
> which you can pass to tidySetCharEncoding().
>
> >2.
> >How to parse the WideString Source code, and return the clear and repaired
> html source code :
> >
> >function TidyHtml(HTML_Source:WideString) : WideString;
> >begin
> >  ???
> >end;
>
> Please see attached demo project for such a function.
>
> Ralf

Other related posts: