[openbeostranslationkit] Re: Structured Text Translation

I think we are overlooking one thing here, which I realized when
I thought more about the task of translating, say HTML, to PDF,
or a Word-like format.

Just like there are two main graphics formats (bitmap and vector), 
there are also different text formats.  There are presentation
formats, such as PDF, PS, Word, etc.

And then there are content formats, such as XML, HTML, and
plain text.  I agree with Brian's criticism of my earlier idea for
structured text here, which is that plain text should be represented
using the same method as the structured text.  (i.e. XML, or HTML,
in this case)

In general practice many people confuse HTML with a
presentation format and even use it for that, although this is an
abuse of the format.  HTML fundamentally can't be directly 
converted into a PDF anymore than a vector graphics object can
be directly converted into a bitmap.  The best you can do is render
the vector object onto a bitmap.  Analogously, the best you can do
in the HTML case is apply a particular layout algorithm to resolve
the inherent ambiguities/device dependencies in HTML.

It may still be possible to get something reasonable out of the
process though. (going html->pdf for example)  However, if you
try to go pdf->html you are going to find that you can't get enough
resolution without resorting to extensions to the spec.  And just
try going from pdf->plain text. :-)

I think that it is not unreasonable to find this fundamental
dichotomy reflected in the translation system.  A "presentation
text" format and a "content text" format, or something like that.

Andrew




Other related posts: