[openbeostranslationkit] Re: Structured Text Translation
- From: "shatty" <shatty@xxxxxxxxxxxxx>
- To: openbeostranslationkit@xxxxxxxxxxxxx
- Date: Sun, 04 Aug 2002 21:25:32 -0700
Wow. This is really unbelievable. I was entirely just thinking about this in
the shower this morning. As part of my preferences based work I have been
thinking about XML on BeOS. It seemed natural to have an XMLTranslator. I
couldn't remember whether or not the translator kit had a primitive
B_TRANSLATOR_STRUCTURED_TEXT type though.
Here's something that we could use perhaps. It's just a rough thing I came up
with so I'm not attached to the fields or the values for the various constants.
You get the point, I'm sure.
Andrew Bachmann
============================ TranslatorFormats.h
enum {
...
B_TRANSLATOR_TEXT = 'TEXT', /* B_ASCII_TYPE */
B_TRANSLATOR_STRUCTURED_TEXT = 'XTXT', /* Structured text */
...
};
struct TranslatorStructuredText {
int32 magic; // B_TRANSLATOR_STRUCTURED_TEXT
int32 charset; // strongly recommend B_UTF8
char escapeChar; // recommend B_UTF8_ESCAPE
uint32 dataSize;
}
enum {
/* structured text file formats B_TRANSLATOR_STRUCTURED_TEXT */
B_HTML_FORMAT = 'HTML',
B_XML_FORMAT = 'XML ',
...
};
// this byte will not occur in the leading byte of a UTF8
// character. see http://www.talisman.org/utf8.html for example
#define B_UTF8_ESCAPE 0b10101010
// these bytes denote the structure
#define B_STRUCTURED_TEXT_PROPERTY_NAME '$'
#define B_STRUCTURED_TEXT_PROPERTY_VALUE '='
#define B_STRUCTURED_TEXT_CHILD_BEGIN '<'
#define B_STRUCTURED_TEXT_CHILD_END '>'
#define B_STRUCTURED_TEXT_CONTENT '!'
============================
The escape byte escapes only the immediately following byte. As is usual, if
there are two escape bytes in a row, rather than interpreting it as an escape
sequence, it should be nterpreted as the literal byte in the expected
encoding. In the case of UTF-8 this will(should) never happen because the
escape character would be illegal if it literally occurred at that location.
(It's value was chosen for this property)
The structure bytes were picked to try to enhance readability of the raw
stream. For example the file:
---------------------------- OBOSTranslatorKitRules.html
<html>
<head><title>OpenBeOS Translator Kit</title></head>
<body onLoad=doJavaScript();>
<h1>It Rules.</h1>
</body>
</html>
----------------------------
Original size= 124 bytes
Could be translated into the following struct and data by an HTMLTranslator:
struct TranslatorStructuredText t;
t.magic = B_HTML_FORMAT;
t.charset = B_UTF8;
t.escapeChar = B_UTF8_ESCAPE;
t.dataSize = 145; // includes the escape bytes and structure bytes
data: (please mentally translate _ into the B_UTF8_ESCAPE byte)
---------------------------- <internal format>
_<_$tag_=html_>
_<_$tag_=head_<_$tag_=title_!OpenBeOS Translator Kit_>_>
_<_$tag_=body_$onLoad_=doJavaScript();_<
_<_$tag_=h1!It Rules._>
_>
_>
----------------------------
In the case of UTF8, a nonstructured file is encoded directly, as one may
expect. (no additional byte cost)
- Follow-Ups:
- [openbeostranslationkit] Re: Structured Text Translation
- From: Brian Matzon
Other related posts:
- » [openbeostranslationkit] Re: Structured Text Translation
- » [openbeostranslationkit] Re: Structured Text Translation
- » [openbeostranslationkit] Re: Structured Text Translation
- » [openbeostranslationkit] Re: Structured Text Translation
- » [openbeostranslationkit] Re: Structured Text Translation
- » [openbeostranslationkit] Re: Structured Text Translation
- » [openbeostranslationkit] Re: Structured Text Translation
- » [openbeostranslationkit] Re: Structured Text Translation
- » [openbeostranslationkit] Re: Structured Text Translation
- » [openbeostranslationkit] Re: Structured Text Translation
- [openbeostranslationkit] Re: Structured Text Translation
- From: Brian Matzon