At 16:39 10.07.2009, Mike Dixon wrote: >I have it partially working now. > >Part of my problem is that I'm using an IE DHTML Editing control for WYSIWYG >editing, and a code control. When I switch to "code mode", I take the source >from the WYSIWYG control and run it through my routine that converts the >HTML tags and attributes back to lowercase (using a TDIHtmlCasePlugin). > >The real problem now is that the WYSIWYG control converts © to a single >copyright character, so when I pass in the following to the parser: > ><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" >"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd";> >C >And I get out: > ><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" >"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd";> >? >The character on the third to the last line is a question mark, instead of >© This points to a character conversion error somewhere in your code or within the DHTML editing control. I don't know. >If I DON'T include a content-type, it comes out of the parser/writer just >fine. I'm guessing this is probably "correct", however there are a lot of >HTML authors who want their code preserved just like they typed it. > >Can I force the parser to ignore the charset-UTF-8 line? You need to set TDIHtmlParser.ReadMethods to the character encoding which DHTML uses to generate its output. However, I would not know which this is. TDIHtmlParser itself does not keep track of any charset instructions within HTML, you must attach a TDIHtmlCharSetPlugin for this. So just do not use this plugin if you want to ignore what you call the "charset-UTF-8 line". Ralf _______________________________________________ Delphi Inspiration mailing list yunqa@xxxxxxxxxxxxx //www.freelists.org/list/yunqa