[openbeos] StyledEdit news of various interest
- From: "Andrew Bachmann" <shatty@xxxxxxxxxxxxx>
- To: "openbeos" <openbeos@xxxxxxxxxxxxx>
- Date: Tue, 15 Jul 2003 23:01:54 -0700 PDT
Hello all,
Since I saw the bugs go out again for stylededit I thought I would look at some
of them. In
particular I figured it was time to check on the status of encodings support.
I'm pleased to
announce that the OBOS StyledEdit now has support for loading and saving to any
of the
encodings supported by libtextencodings.so. (this is a number more encodings
than R5
StyledEdit) Some may remember the discussion a while back over the way to do
this and I chose
to implement it via convert_to/from_utf8, which seemed to be the general
consensus.
As part of my effort to move towards better support for encodings in general I
tried to create an
abstraction to manage encodings. Two tasks that I wanted this abstraction to
perform in
particular were: enumerating the encodings supported, and supplying human
readable names
for them. In addition to these tasks, I wanted to be able to get the "font id"
(suitable for
BFont::SetEncoding) for an encoding, and the "conversion id" (suitable for
convert_to/from_utf8)
for an encoding. I also wanted to step a bit beyond the arbitrary nature of
the encodings
collection in R5. To do this I consulted the IANA, which manages a list of
encodings and various
properties of them. (see http://www.iana.org/assignments/character-sets )
The end result is in two files which are currently located in the stylededit
directory. These files
are CharacterSet.h and CharacterSet.cpp. There are two classes defined here.
One is a
CharacterSetRoster, which supports enumerating the supported character sets,
and finding a
character set by various search criteria. (some "find" methods are constant
time, others are
linear) The other class, CharacterSet, represents an individual character set.
The fields in the
CharacterSet class are taken from the IANA document listed above. Some nice
things that this
added to my initial list of requirements: the MIME name for a character set, if
it exists, the
canonical IANA name for a character set, and a set of known aliases for a
character set. It also
added something called a MIB enum that I am honestly not to sure about but
perhaps would be
useful for hard-core publishing type apps or somesuch? (I put it in for
completeness.)
OBOS StyledEdit now stands as an example of how to use this functionality. I'm
hoping that
something like this could be made a standard through beunited. However I don't
really hope
that this particular interface/implementation makes it. :-) Although it works,
it may be
preferable to support a different enumeration interface than the method I chose
at the time.
Also, the set of character sets is hard-coded into CharacterSet.cpp. I think
it would be superior to
have it read from a file. Unfortunately the IANA document is not easily
parsable, however it
could be manually doctored into a more readable format which could be
programmatically read
without difficulty. Also, some of the character set assignments for particular
conversion ids are
guesses. I assume for example that the B_MACINTOSH_ROMAN font encoding is the
same as
the one denoted by the IANA name "macintosh".
Once I set up the above abstraction for managing encodings to my satisfaction,
I used it in
StyledEdit to provide the encodings menus and read/write capability.
StyledEditView::GetStyledText uses the BTranslationUtils::GetStyledText to read
a stream into
the view. As part of this, it checks for a few attributes: "alignment",
"wrap", and "be:encoding".
[these are used by R5 StyledEdit] Annoyingly the R5
BTranslationUtils::GetStyledText doesn't
handle this itself. This is somewhat understandable since
BTranslationUtils::GetStyledText
takes a BPositionIO, which doesn't provide the ReadAttr interface. Also
annoying is that R5
StyledEdit uses the number 65535 as UTF8 for "be:encoding". Because
BTranslationUtils::GetStyledText doesn't handle encodings I ended up reading
the file first
with BTranslationUtils::GetStyledText and then re-reading the file into a
temporary buffer on
which I perform the convert_to_utf8. This is lamentable, because it means the
file is read twice.
It did save me from having to parse the "runs", which would have basically
meant copying
BTranslationUtils::GetStyledText code directly into StyledEdit, which I felt
(at least at that
moment) would be even more depressing.
Writing is similarly annoying. Using R5's
BTranslationUtils::WriteStyledEditFile doesn't write
the "alignment", "wrap", or "be:encoding" attributes, for no apparent reason at
all. The OBOS
implementation does write these. It is my opinion the OBOS version is
superior. However, the
OBOS StyledEdit currently also performs the writing of these attributes since
it is required when
linking against R5 libtranslation.so. The OBOS implementation is superior in
supporting not
only "alignment" and "wrap" but also "be:encoding". However, the "encoding" of
a BTextView is
not an existing field. There is a BTextView::SetAlignment, but no
BTextView::SetEncoding, for
example. And so OBOS BTranslationUtils::WriteStyledEditFile simply writes
65535 for
"be:encoding". And similar to reading, I have performed a lamentable second
write, which
zeroes the file contents and refills them with converted text. The depressing
alternative again is
virtually equivalent to copying out the BTranslationUtils::WriteStyledEditFile
implementation
and modifying it.
I should point out that the implementation for performing the conversion is a
little bit tricky
and if someone decides to relocate this functionality I would highly recommend
observing and
moving the code that I have written for performing this conversion in a
256-character buffer. It
seems to properly convert the characters even when they are broken across the
256-character
boundary, which was something that took some thinking and testing. Reading in
particular was
fairly annoying.
Warning: potential R2 issue + opinion ahead. :-)
In my opinion this functionality probably should be relocated, and moved into
existing
BTranslationUtils functions or new functions, possibly accepting BFile instead
of BPositionIO.
BTextView should be expanded to support SetEncoding/Encoding.
One last disappointing observation on R5 StyledEdit vs. OBOS StyledEdit: The
R5 StyledEdit is
able to open a UTF8 file that I have created in it, without issue. However,
the OBOS StyledEdit
can not open this file. The failure is not in the StyledEdit code. The
failure occurs when
BTextView::GetStyledText is called on the file. It returns
B_TRANSLATION_ERROR_BASE.
This occurs when calling the R5 version of GetStyledText. My conclusion from
this is that the R5
StyledEdit implementation does not even use BTextView::GetStyledText to
populate the view.
IMHO this is quite unfortunate. Hopefully the OBOS version of GetStyledText
will be able to do
the right thing. My guess is that the R5 version of GetStyledText fails
because it uses the
STXTTranslator, which decides that the input file is not a text file. (the
input file is text, it is
chinese) R5 StyledEdit doesn't use GetStyledText at all and doesn't try to
make a determination.
And a mystery for those interested: the outstanding bug on viewing files with
dos newlines (they
get doubled) has left me baffled. The R5 StyledEdit handles not only dos
newlines, but
apparently any combination of them with unix newlines. The newlines bytes have
not been
removed or replaced, as they can be cut and pasted, and even saved. It's
almost as if the display
routine understands these different newlines. I thought to override CanEndLine
in BTextView,
but it seems that is only used when wrapping is on, but these lines work
properly in R5
StyledEdit. I'm at a loss.
In the meantime, please try the OBOS StyledEdit encoding features and let me
know if you have
any problems.
Andrew
- Follow-Ups:
- [openbeos] Re: StyledEdit news of various interest
- From: Ingo Weinhold
- [openbeos] Re: StyledEdit news of various interest
- From: Alexander G. M. Smith
Other related posts:
- » [openbeos] StyledEdit news of various interest
- » [openbeos] Re: StyledEdit news of various interest
- » [openbeos] Re: StyledEdit news of various interest
- » [openbeos] Re: StyledEdit news of various interest
- » [openbeos] Re: StyledEdit news of various interest
- » [openbeos] Re: StyledEdit news of various interest
- » [openbeos] Re: StyledEdit news of various interest
- » [openbeos] Re: StyledEdit news of various interest
- » [openbeos] Re: StyledEdit news of various interest
- » [openbeos] Re: StyledEdit news of various interest
- » [openbeos] Re: StyledEdit news of various interest
- » [openbeos] Re: StyledEdit news of various interest
- » [openbeos] Re: StyledEdit news of various interest
- » [openbeos] Re: StyledEdit news of various interest
- » [openbeos] Re: StyledEdit news of various interest
- [openbeos] Re: StyledEdit news of various interest
- From: Ingo Weinhold
- [openbeos] Re: StyledEdit news of various interest
- From: Alexander G. M. Smith