I think we are overlooking one thing here, which I realized when I thought more about the task of translating, say HTML, to PDF, or a Word-like format. Just like there are two main graphics formats (bitmap and vector), there are also different text formats. There are presentation formats, such as PDF, PS, Word, etc. And then there are content formats, such as XML, HTML, and plain text. I agree with Brian's criticism of my earlier idea for structured text here, which is that plain text should be represented using the same method as the structured text. (i.e. XML, or HTML, in this case) In general practice many people confuse HTML with a presentation format and even use it for that, although this is an abuse of the format. HTML fundamentally can't be directly converted into a PDF anymore than a vector graphics object can be directly converted into a bitmap. The best you can do is render the vector object onto a bitmap. Analogously, the best you can do in the HTML case is apply a particular layout algorithm to resolve the inherent ambiguities/device dependencies in HTML. It may still be possible to get something reasonable out of the process though. (going html->pdf for example) However, if you try to go pdf->html you are going to find that you can't get enough resolution without resorting to extensions to the spec. And just try going from pdf->plain text. :-) I think that it is not unreasonable to find this fundamental dichotomy reflected in the translation system. A "presentation text" format and a "content text" format, or something like that. Andrew