[openbeostranslationkit] Text Translation
- From: Brian Matzon <brian@xxxxxxxxx>
- To: openbeostranslationkit@xxxxxxxxxxxxx
- Date: Sun, 04 Aug 2002 19:49:48 +0200
This is just some stuff I've been tinkering with for the future
translation kit. I really could use some input...
I should have posted this some time ago, but for reasons unknown I forgot :/
----
Current translation of images (or many other data types) convert from a
specific format, to a generic format and from that
generic format to a specific format. Thus translation from PNG to JPEG
involves two steps:
1 - convert PNG to generic format
2 - convert generic format to JPEG
The reason for this two step process is that if a translator should be
able to convert from PNG to JPG the writer
would have to know both the PNG type (to read it) and the JPEG (to write
it). By using an intermediary format the translation process is greatly
simplified.
The problem with Text Translation
By using a temporary format it is possible to simplify the whole
translation process. Defining a temporary
format for images, and even sound isn't a big problem* since data can be
described rather easily. Thus images are described as this:
struct TranslatorBitmap {
int32 magic;
BRect bounds;
uint32 rowBytes;
color_space colors;
uint32 dataSize;
}
and sound:
struct TranslatorSound {
int32 magic;
uint32 channels;
float sampleFreq;
uint32 numFrames;
}
The above data structures define image and sound data - not any meta
data. What this means in terms of text translation
is that we need to define a data format for text too. Currently the
'B_TRANSLATOR_TEXT' format is just defined as
plain old ASCII text. This fits nicely with sound and image data.
However, unlike sound and images, Text looses a great deal of
information by loosing it's metadata layer. By removing the metadata
layer only text will be left, thus all formatting will be lost, images
or other embedded data will be lost.
We therefore need to establish a generic format that is understandable
by all translators. The current ASCII solution just isn't usefull.
possible solutions:
- Binary format
- OpenOffice.org document format
- XHTML (strict!)
- own XML format
There is no correct format, but I am leaning against XML formats, since
this would allow us to create translators in both binary format and
create a XSLT translator.
I am a bit withholding about using openoffice.org as format since it is
rather complex.
I actually prefer XHTML, since it is easy to understand, and quite
widespread.
need some more pros & cons..........
* By converting to a temporary data-only format all meta data is lost
(this is not a problem for most images and most sound)
Other related posts:
- » [openbeostranslationkit] Text Translation