[yunqa.de] Re: DOM-like plugin or demo for DiHTMLParser?

  • From: Delphi Inspiration <delphi@xxxxxxxx>
  • To: yunqa@xxxxxxxxxxxxx
  • Date: Wed, 26 Jan 2011 14:59:50 +0100

On 25.01.2011 11:36, Laurent Breysse wrote:

> DiHTMLParser is event driven (SAX-like) parser, but I wondered if a tree
> structure of HTML pieces could easily be built, based on triggered
> events, with a full-scope parsing?

DIHtmlParser can not build an in-memory DOM tree structure for an HTML
document.

> Is there any information about the parent piece provided to the event
> handler during parsing, or a 'CurrentPiece.Parent' property I could have
> missed?

DIHtmlParser does not track parent or child elements. Its strength is to
deliver individual HTML tokens quickly. Advanced parsing tasks should be
built on top of this.

Creating an HTML DOM trees or tracking HTML parent and child elements
are two of those tasks. Both are not trivial because of implicitly
closed elements and overlapping tags. In addition, I found that most
dedicated parsing tasks perform faster using a state machine rather than
to build an entire DOM structure.

If DOM structure is required, DIXml provides HTML DOM parsing and XSLT
transormation:

  http://yunqa.de/delphi/doku.php/products/xml/index

Please run the DIXml \Demos\DIXml_Node_Tree\ project to see how it
parses HTML into a DOM tree structure.

> Or maybe there is already a plugin that build a tree on the fly,
during a full-scope parsing?

A DOM plugin is not part of DIHtmlParser, and I am not aware of any such
plugin written by another user.

Ralf
_______________________________________________
Delphi Inspiration mailing list
yunqa@xxxxxxxxxxxxx
//www.freelists.org/list/yunqa



Other related posts: