[yunqa.de] Re: DIHtmlParser and Entities

  • From: Delphi Inspiration <delphi@xxxxxxxx>
  • To: yunqa@xxxxxxxxxxxxx
  • Date: Fri, 10 Jul 2009 16:00:14 +0200

At 16:39 10.07.2009, Mike Dixon wrote:

>I have it partially working now.
>
>Part of my problem is that I'm using an IE DHTML Editing control for WYSIWYG
>editing, and a code control. When I switch to "code mode", I take the source
>from the WYSIWYG control and run it through my routine that converts the
>HTML tags and attributes back to lowercase (using a TDIHtmlCasePlugin).
>
>The real problem now is that the WYSIWYG control converts &copy; to a single
>copyright character, so when I pass in the following to the parser:
>
><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
>"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd";>
>C 
>And I get out:
>
><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
>"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd";>
>? 
>The character on the third to the last line is a question mark, instead of
>&copy;

This points to a character conversion error somewhere in your code or within 
the DHTML editing control. I don't know.

>If I DON'T include a content-type, it comes out of the parser/writer just
>fine. I'm guessing this is probably "correct", however there are a lot of
>HTML authors who want their code preserved just like they typed it.
>
>Can I force the parser to ignore the charset-UTF-8 line?

You need to set TDIHtmlParser.ReadMethods to the character encoding which DHTML 
uses to generate its output. However, I would not know which this is.

TDIHtmlParser itself does not keep track of any charset instructions within 
HTML, you must attach a TDIHtmlCharSetPlugin for this. So just do not use this 
plugin if you want to ignore what you call the "charset-UTF-8 line".

Ralf 

_______________________________________________
Delphi Inspiration mailing list
yunqa@xxxxxxxxxxxxx
//www.freelists.org/list/yunqa



Other related posts: