Dear Ralf, Thank you for your reply. It works. I want to by DITidy without source first(Euro 60), then I want to upgrade to DITidy with source by paying you Euro 120? Thanks, Bear On Sun, Feb 8, 2009 at 7:56 PM, Delphi Inspiration <delphi@xxxxxxxx> wrote: > Bear Xu wrote: > > >1. > >I do not know the encode type of the the html file(may be windows 1251 or > gb2312, or utf8 or others), e.g. used in a crawler, > >How to use DiTidy to parse the html file with unicode? it is possible? > > DITidy readily supports these encodings: > > raw > ascii > latin0 > latin1 > utf8 > iso2022 > mac > win1252 > ibm858 > utf16le > utf16be > utf16 > big5 > shiftjis > > DITidy also detects encoding markers present in HTML documents. If such a > marker is missing, you can set the encoding manually before the parsing. > > tidySetCharEncoding(Doc, PAnsiChar(cboEncoding.Text)); > > See DITidy_Analyse.dpr for an example implementation. > > Web servers often return the character encodings along with their response > which you can pass to tidySetCharEncoding(). > > >2. > >How to parse the WideString Source code, and return the clear and repaired > html source code : > > > >function TidyHtml(HTML_Source:WideString) : WideString; > >begin > > ??? > >end; > > Please see attached demo project for such a function. > > Ralf