[yunqa.de] Re: Pre-sale questions about DIHtmlParser

  • From: Delphi Inspiration <delphi@xxxxxxxx>
  • To: yunqa@xxxxxxxxxxxxx
  • Date: Fri, 28 Oct 2011 10:51:42 +0200

On 27.10.2011 15:48, Edwin Yip wrote:

> One more question, now I see that dixml is more suitable for my problem,
> but this led me to another question - what DIHtmlPaser can do that DIXml
> doesn't? 

It is not the question what DIHtmlParser can to that DIXml can not, or
the other way around. Both approach parsing quite differently.

DIHtmlParser parses HTML and XML in a linear fashion. It does not build
DOM structures and does not need memory for that. It is therefore very
fast, especially for large files. Document encodings are converted to
Unicode on the fly, interface functions use UnicodeString. DIHtmlParser
is customizable via Filters and Plugins. I am using DIHtmlParser for
HTML data extraction and modification, both simple and advanced (think
state machine).

DIXml is more "intelligent" than DIHtmlParser in that it knows about XML
and HTML document structures. This makes DIXml more standard conformant,
but non-standard documents can turn out problematic and HTML 5 specifics
are not built in. DIXml converts character encodings to UTF-8 internally
and passes UTF-8 to applications. Because of its DOM facilities, data
extraction and document modification is different, but XSLT
transformations are available.

So it really depends on your requirements and/or liking if DIHtmlParser
or DIXml suites you better.

Ralf
_______________________________________________
Delphi Inspiration mailing list
yunqa@xxxxxxxxxxxxx
//www.freelists.org/list/yunqa



Other related posts: