[openbeos] StyledEdit news of various interest

  • From: "Andrew Bachmann" <shatty@xxxxxxxxxxxxx>
  • To: "openbeos" <openbeos@xxxxxxxxxxxxx>
  • Date: Tue, 15 Jul 2003 23:01:54 -0700 PDT

Hello all,

Since I saw the bugs go out again for stylededit I thought I would look at some 
of them.  In 
particular I figured it was time to check on the status of encodings support.  
I'm pleased to 
announce that the OBOS StyledEdit now has support for loading and saving to any 
of the 
encodings supported by libtextencodings.so.  (this is a number more encodings 
than R5 
StyledEdit)  Some may remember the discussion a while back over the way to do 
this and I chose 
to implement it via convert_to/from_utf8, which seemed to be the general 
consensus.

As part of my effort to move towards better support for encodings in general I 
tried to create an 
abstraction to manage encodings.  Two tasks that I wanted this abstraction to 
perform in 
particular were: enumerating the encodings supported, and supplying human 
readable names 
for them.  In addition to these tasks, I wanted to be able to get the "font id" 
(suitable for 
BFont::SetEncoding) for an encoding, and the "conversion id" (suitable for 
convert_to/from_utf8) 
for an encoding.  I also wanted to step a bit beyond the arbitrary nature of 
the encodings 
collection in R5.  To do this I consulted the IANA, which manages a list of 
encodings and various 
properties of them.  (see http://www.iana.org/assignments/character-sets )

The end result is in two files which are currently located in the stylededit 
directory.  These files 
are CharacterSet.h and CharacterSet.cpp.  There are two classes defined here.  
One is a 
CharacterSetRoster, which supports enumerating the supported character sets, 
and finding a 
character set by various search criteria.  (some "find" methods are constant 
time, others are 
linear)  The other class, CharacterSet, represents an individual character set. 
 The fields in the 
CharacterSet class are taken from the IANA document listed above.  Some nice 
things that this 
added to my initial list of requirements: the MIME name for a character set, if 
it exists, the 
canonical IANA name for a character set, and a set of known aliases for a 
character set.  It also 
added something called a MIB enum that I am honestly not to sure about but 
perhaps would be 
useful for hard-core publishing type apps or somesuch?  (I put it in for 
completeness.)

OBOS StyledEdit now stands as an example of how to use this functionality.  I'm 
hoping that 
something like this could be made a standard through beunited.  However I don't 
really hope 
that this particular interface/implementation makes it. :-)  Although it works, 
it may be 
preferable to support a different enumeration interface than the method I chose 
at the time.  
Also, the set of character sets is hard-coded into CharacterSet.cpp.  I think 
it would be superior to 
have it read from a file.  Unfortunately the IANA document is not easily 
parsable, however it 
could be manually doctored into a more readable format which could be 
programmatically read 
without difficulty.  Also, some of the character set assignments for particular 
conversion ids are 
guesses.  I assume for example that the B_MACINTOSH_ROMAN font encoding is the 
same as 
the one denoted by the IANA name "macintosh".

Once I set up the above abstraction for managing encodings to my satisfaction, 
I used it in 
StyledEdit to provide the encodings menus and read/write capability.  
StyledEditView::GetStyledText uses the BTranslationUtils::GetStyledText to read 
a stream into 
the view.  As part of this, it checks for a few attributes: "alignment", 
"wrap", and "be:encoding".  
[these are used by R5 StyledEdit]  Annoyingly the R5 
BTranslationUtils::GetStyledText doesn't 
handle this itself.  This is somewhat understandable since 
BTranslationUtils::GetStyledText 
takes a BPositionIO, which doesn't provide the ReadAttr interface.  Also 
annoying is that R5 
StyledEdit uses the number 65535 as UTF8 for "be:encoding".  Because 
BTranslationUtils::GetStyledText doesn't handle encodings I ended up reading 
the file first 
with BTranslationUtils::GetStyledText and then re-reading the file into a 
temporary buffer on 
which I perform the convert_to_utf8.  This is lamentable, because it means the 
file is read twice.  
It did save me from having to parse the "runs", which would have basically 
meant copying 
BTranslationUtils::GetStyledText code directly into StyledEdit, which I felt 
(at least at that 
moment) would be even more depressing.

Writing is similarly annoying.  Using R5's 
BTranslationUtils::WriteStyledEditFile doesn't write 
the "alignment", "wrap", or "be:encoding" attributes, for no apparent reason at 
all.  The OBOS 
implementation does write these.  It is my opinion the OBOS version is 
superior.  However, the 
OBOS StyledEdit currently also performs the writing of these attributes since 
it is required when 
linking against R5 libtranslation.so.  The OBOS implementation is superior in 
supporting not 
only "alignment" and "wrap" but also "be:encoding".  However, the "encoding" of 
a BTextView is 
not an existing field.  There is a BTextView::SetAlignment, but no 
BTextView::SetEncoding, for 
example.  And so OBOS BTranslationUtils::WriteStyledEditFile simply writes 
65535 for 
"be:encoding".  And similar to reading, I have performed a lamentable second 
write, which 
zeroes the file contents and refills them with converted text.  The depressing 
alternative again is 
virtually equivalent to copying out the BTranslationUtils::WriteStyledEditFile 
implementation 
and modifying it.

I should point out that the implementation for performing the conversion is a 
little bit tricky 
and if someone decides to relocate this functionality I would highly recommend 
observing and 
moving the code that I have written for performing this conversion in a 
256-character buffer.  It 
seems to properly convert the characters even when they are broken across the 
256-character 
boundary, which was something that took some thinking and testing.  Reading in 
particular was 
fairly annoying.

Warning: potential R2 issue + opinion ahead. :-)

In my opinion this functionality probably should be relocated, and moved into 
existing 
BTranslationUtils functions or new functions, possibly accepting BFile instead 
of BPositionIO.  
BTextView should be expanded to support SetEncoding/Encoding.

One last disappointing observation on R5 StyledEdit vs. OBOS StyledEdit:  The 
R5 StyledEdit is 
able to open a UTF8 file that I have created in it, without issue.  However, 
the OBOS StyledEdit 
can not open this file.  The failure is not in the StyledEdit code.  The 
failure occurs when 
BTextView::GetStyledText is called on the file.  It returns 
B_TRANSLATION_ERROR_BASE.  
This occurs when calling the R5 version of GetStyledText.  My conclusion from 
this is that the R5 
StyledEdit implementation does not even use BTextView::GetStyledText to 
populate the view.  
IMHO this is quite unfortunate.  Hopefully the OBOS version of GetStyledText 
will be able to do 
the right thing.  My guess is that the R5 version of GetStyledText fails 
because it uses the 
STXTTranslator, which decides that the input file is not a text file.  (the 
input file is text, it is 
chinese)  R5 StyledEdit doesn't use GetStyledText at all and doesn't try to 
make a determination.

And a mystery for those interested: the outstanding bug on viewing files with 
dos newlines (they 
get doubled) has left me baffled.  The R5 StyledEdit handles not only dos 
newlines, but 
apparently any combination of them with unix newlines.  The newlines bytes have 
not been 
removed or replaced, as they can be cut and pasted, and even saved.  It's 
almost as if the display 
routine understands these different newlines.  I thought to override CanEndLine 
in BTextView, 
but it seems that is only used when wrapping is on, but these lines work 
properly in R5 
StyledEdit.  I'm at a loss.

In the meantime, please try the OBOS StyledEdit encoding features and let me 
know if you have 
any problems.

Andrew


Other related posts: