[yunqa.de] Re: TDIHtmlCasePlugin problem

  • From: Delphi Inspiration <delphi@xxxxxxxx>
  • To: yunqa@xxxxxxxxxxxxx
  • Date: Thu, 10 Jun 2021 15:08:59 +0200

Your input HTML contains ambiguous ampersands, according to the HTML standard: https://html.spec.whatwg.org/#syntax-ambiguous-ampersand

The standard demands that "Normal elements [...] must not contain the character U+003C LESS-THAN SIGN (<) or an ambiguous ampersand." https://html.spec.whatwg.org/#elements-2

TDIHtmlWriterPlugin takes care that these requirements are met.

TDIHtmlWriterPlugin.PredefinedEntities allows to not encode "&" to "&amp;" in normal text. But there is no setting (yet) to output plain "&" in attribute values. Background is that such a setting can generate ambiguous results, as the name "ambiguous ampersand" suggests, leading to potentially invalid links.

My recommendation is to keep "&amp;" in your result HTML.

If you really *must* avoid "&amp;", let me know and I will see to add some option to DIHtmlParser. If so, I'd also be interested in why plain "&" is so important to you, even though it's against the standard.


On 10.06.2021 12:28, Max Terentiev wrote:

I use TDIHtmlParser + TDIHtmlCasePlugin + TDIHtmlWriter to convert
uppercase html tags <A>, <DIV>, etc to lower case <a>, <div>, etc.

And I have problem with links:

If my html contains links like:

<a href="https://domain.com/?o=5109&w=434288&s=1&l=1";>

they become

<a href="https://domain.com/?o=5109&amp;w=434288&amp;s=1&amp;l=1";>

How to tune DiHtmlParser to NOT insert &amp; into links href ?

