Hi, Ralf !
I fixed problem by changing line 380 of DIUnicodeHtmlWriter.pas
from this
UCP_AMPERSAND:
if (not (avoNoAmpEscapeBeforeCurly in Options)
or (i >= l) or (U16_NEXT_OR_FFFD(p, i, l) <>
UCP_LEFT_CURLY_BRACKET)) then
WriteBufW('&', 5)
to this
UCP_AMPERSAND:
if (peAmp in FPredefinedEntities) and (not (avoNoAmpEscapeBeforeCurly
in Options)
or (i >= l) or (U16_NEXT_OR_FFFD(p, i, l) <>
UCP_LEFT_CURLY_BRACKET)) then
WriteBufW('&', 5)
It's works for me now.
Thanks for help anyway !
---
With best regards, Max Terentiev.
Business Software Products.
AMS Development Team.
support@xxxxxxxxxx
-----Original Message-----
From: yunqa-bounce@xxxxxxxxxxxxx [mailto:yunqa-bounce@xxxxxxxxxxxxx] On Behalf ;
Of Delphi Inspiration
Sent: Thursday, June 10, 2021 4:09 PM
To: yunqa@xxxxxxxxxxxxx
Subject: [yunqa.de] Re: TDIHtmlCasePlugin problem
Your input HTML contains ambiguous ampersands, according to the HTML
standard: https://html.spec.whatwg.org/#syntax-ambiguous-ampersand
The standard demands that "Normal elements [...] must not contain the
character U+003C LESS-THAN SIGN (<) or an ambiguous ampersand."
https://html.spec.whatwg.org/#elements-2
TDIHtmlWriterPlugin takes care that these requirements are met.
TDIHtmlWriterPlugin.PredefinedEntities allows to not encode "&" to
"&" in normal text. But there is no setting (yet) to output plain
"&" in attribute values. Background is that such a setting can generate
ambiguous results, as the name "ambiguous ampersand" suggests, leading
to potentially invalid links.
My recommendation is to keep "&" in your result HTML.
If you really *must* avoid "&", let me know and I will see to add
some option to DIHtmlParser. If so, I'd also be interested in why plain
"&" is so important to you, even though it's against the standard.
Ralf
On 10.06.2021 12:28, Max Terentiev wrote:
I use TDIHtmlParser + TDIHtmlCasePlugin + TDIHtmlWriter to convert
uppercase html tags <A>, <DIV>, etc to lower case <a>, <div>, etc.
And I have problem with links:
If my html contains links like:
<a href="https://domain.com/?o=5109&w=434288&s=1&l=1";>
they become
<a href="https://domain.com/?o=5109&w=434288&s=1&l=1";>
How to tune DiHtmlParser to NOT insert & into links href ?