[yunqa.de] Re: DiHTMLParser and template syntax used by some server-side scripting languages?

  • From: Delphi Inspiration <delphi@xxxxxxxx>
  • To: yunqa@xxxxxxxxxxxxx
  • Date: Fri, 16 Dec 2011 12:33:21 +0100

On 16.12.2011 07:33, Edwin Yip wrote:

> What will happen if DiHTMLParser reads the following template syntax 
> used by some server-side scripting languages? For example, code 
> enclosed by {% and %} which is very common.
> 
> {% some_strings_here %}

Since this syntax does not match any of the supported HTML pieces, 
TDIHtmlParser parses this as text, just as browsers do.

> Another question: in your manual "Undefined piece type. Applications
> should never see this." What if the parser meet a invalid code?

Prior to HTML 5, HTML did not really have "invalid" code. Characters not 
matching a tag, comment, or other piece of HTML were usually interpreted as 
text. The specification left the exact interpretation mostly up to what browser 
developers thought best to handle the HTML found on the web.

The approach for DIHtmlParser is to return as much meaningful content as 
possible. It is not meant to check HTML syntax. Libraries to detect HTML syntax 
errors include, for example, DITidy:

  http://yunqa.de/delphi/doku.php/products/tidy/index

> How the applications knows?

Applications can check if TDIHtmlParser.PieceType returns ptUndefined. If so, 
this indicates a bug in DIHtmlParser because it should always return ptText as 
the most basic piece type. I am not aware of any HTML which tricks 
TDIHtmlParser to detect an undefined piece of HTML so applications can usually 
ignore ptUndefined.

Ralf
_______________________________________________
Delphi Inspiration mailing list
yunqa@xxxxxxxxxxxxx
//www.freelists.org/list/yunqa



Other related posts: