Your input HTML contains ambiguous ampersands, according to the HTML
The standard demands that "Normal elements [...] must not contain the character U+003C LESS-THAN SIGN (<) or an ambiguous ampersand." https://html.spec.whatwg.org/#elements-2
TDIHtmlWriterPlugin takes care that these requirements are met.
TDIHtmlWriterPlugin.PredefinedEntities allows to not encode "&" to "&" in normal text. But there is no setting (yet) to output plain "&" in attribute values. Background is that such a setting can generate ambiguous results, as the name "ambiguous ampersand" suggests, leading to potentially invalid links.
My recommendation is to keep "&" in your result HTML.
If you really *must* avoid "&", let me know and I will see to add some option to DIHtmlParser. If so, I'd also be interested in why plain "&" is so important to you, even though it's against the standard.
On 10.06.2021 12:28, Max Terentiev wrote:
I use TDIHtmlParser + TDIHtmlCasePlugin + TDIHtmlWriter to convert
uppercase html tags <A>, <DIV>, etc to lower case <a>, <div>, etc.
And I have problem with links:
If my html contains links like:
How to tune DiHtmlParser to NOT insert & into links href ?