[yunqa.de] Re: How to convert HTML to XML?

  • From: Bear Xu <bear.xy@xxxxxxxxx>
  • To: yunqa@xxxxxxxxxxxxx
  • Date: Fri, 13 Mar 2009 08:40:21 +0800

thank you very much!!
Bear

On Fri, Mar 13, 2009 at 1:01 AM, Delphi Inspiration <delphi@xxxxxxxx> wrote:

> Bear Xu wrote:
>
> >How to convert a HTML to XML ?
> >Any example?
> >Do I need DIHTML and DITidy at the same time?
> >Can I export to XML in DITidy?
>
> You can use DITidy or DIXml to convert HTML to XML.
>
> * DITidy (http://www.yunqa.de/delphi/doku.php/products/tidy/index):
>
> DITidy is a pure HTML parser and beautifier, but there are options to tweak
> its output towards XHTML and XML:
>
> - TidyXhtmlOut
>
> This option specifies if Tidy should generate pretty printed output,
> writing it as extensible HTML. This option causes Tidy to set the DOCTYPE
> and default namespace as appropriate to XHTML. If a DOCTYPE or namespace is
> given they will checked for consistency with the content of the document. In
> the case of an inconsistency, the corrected values will appear in the
> output. For XHTML, entities can be written as named or numeric entities
> according to the setting of the "numeric-entities" option. The original case
> of tags and attributes will be preserved, regardless of other options.
>
> - TidyXmlOut
>
> This option specifies if Tidy should pretty print output, writing it as
> well-formed XML. Any entities not defined in XML 1.0 will be written as
> numeric entities to allow them to be parsed by a XML parser. The original
> case of tags and attributes will be preserved, regardless of other options.
>
> See the DITidy_Hello_World.dpr project for how to set the options (uses
> TidyXhtmlOut by default).
>
> * DIXml (http://www.yunqa.de/delphi/doku.php/products/xml/index):
>
> DIXml contains both an XML and HTML parser. Both read documents into a DOM
> structure. Regardless of the parser used, the result DOM tree can then be
> written as XML using xmlSaveFile().
>
> I added a "Save as XML ..." button to the DIXml_Node_Tree.dpr demo project
> (attached).
>
> Ralf

Other related posts: