[openbeostranslationkit] Re: Structured Text Translation

I had a nice long email written up to make this point but since my 
webmail ate it I thought I would just be rather terse about it and go 
into more details if required.

There are two fundamentally different kinds of text documents.  There 
are PDF, Postscript, Word, etc.  These are presentation formats.  They 
are intended to be "pixel-perfect".  On the other hand, there are HTML, 
XML, CSV, which are content, not format specifications.  They are 
intended to be represented in an application/device dependent way.

HTML has gotten a lot of abuse in this respect because people use it as 
a presentation format, but it isn't.

This distinction is similar to the one in the image space between 
vector-based images and bitmap-based  images.  You can turn one into 
the other, but this is really a rendering/analysis project, not a 
translation.

So, in summary, I think it's reasonable to have two different types of 
text, "presentation text" and "content text".

Andrew



Other related posts: