Re: [icon-users] Character encodings on html output

In message <9c6f7f834f.mike@xxxxxxxxxxxxxxxxxx>
          mike.hobbs@xxxxxxxxxx wrote:

> I created a document containing a 'dagger' character (156 in the
> ASCII character set as displayed in !XChars). [...]

This is a very complex issue and you will find a lot of incorrect or 
misleading information in the web, some of which you have quoted. ;-) 
Most of the information you find probably does not mention whether it 
refers to HTML 3.2 or 4.0, and there is a world of a difference 
between these two.

The short story regarding the dagger is that EW/TW outputs HTML 3.2, 
which is based on ISO Latin-1 encoding, which does not offer a dagger.

> To cut this confusing issue short, when TW outputs html should it
> not do a sensible transcoding from the RISCOS character encoding to
> something that most web browsers will recognize (e.g. UTF-8)?

Yes, of course, but there are a hundred other things HTML export 
should do as well. Basically, the HTML export in EW/TW is archaic and 
does not even attempt to address any of these issues. It does not even 
set a document encoding. You are likely to be safe with ISO Latin-1 
characters (in XChars, everything except the row in the middle 
containing e.g., the smart quotes, oe ligature, dagger).

Martin
-- 
---------------------------------------------------------------------
Martin Wuerthner           MW Software          lists@xxxxxxxxxxxxxxx
---------------------------------------------------------------------
------------------------------------------------------------
    To change, suspend or cancel your subscription go to
          http://www.freelists.org/list/icon-users
------------------------------------------------------------


Other related posts: