[openbeos] StyledEdit and character set encodings

Hello all,

First, some good news.  StyledEdit received some updates today, mostly 
relating to handling files provided at the command line.  (thanks to 
BGA for bringing up the issue)  I think it's fair to say that OBOS 
StyledEdit is now better than the R5 one in almost every respect.

I've still been working on/thinking about the issue of character set 
encodings.  After some further reading I discovered that BTextView has 
a caveat in its implementation.  StyledEdit uses BTextView to provide 
the majority of the text editing functionality.  The caveat is that 
BTextView is set up to only handle UTF-8 text.

So, if I implement the BFont::SetEncoding solution that I had pretty 
much decided on earlier we would have no guarantees that it would work 
properly with the BTextView.  Also, I have come to the realization that 
the original StyledEdit used the convert_to_utf8/convert_from_utf8 
solution.  So, if we move to that we are still providing the same level 
of functionality.

Although this approach still has the drawback that files that are 
loaded and re-saved may not be binary identical after no changes, I 
think this is acceptible.  This is acceptible because we still provide 
a solution which preserves identicality (use UTF-8) and I suspect that 
cases where it will not be binary identical are minimal. (these are 
hopefully rare)  Also, it may be the case that after the first load-
save cycle,  on each subsequent load-save cycle, binary identicality 
will be preserved.

In the end, this could encourage people to use UTF-8 for their file 
format which I think would generally a good thing.  It will also 
simplify the development of inter-application communication as it is 
unreasonable to expect that all applications on BeOS will be able to 
handle any character set encoding.  (although they still can if they 
like)

If someone has a specialized situation where they require working in 
non-UTF-8 files they can still do so and will probably never experience 
a problem with it.  If a situation comes up where this is a problem we 
can either address it or it will be an opportunity for a third-party to 
address it through commercial software if it is worth it for them to 
do.

As part of my work on the character set issue I have been developing a 
registry for character sets that is capable of providing IDs that can 
be used to select a character set for BFont::SetEncoding or for 
convert_to_utf8/convert_from_utf8.  This registry will also provide 
human readable names and other standard names that I have acquired from 
a list at the IANA. [ http://www.iana.org/assignments/character-sets ]  
It will also allow retrieving a character set by using any of the 
standard names or aliases for the character set, or enumerating all 
known character sets.

Initially this registry will simply be part of stylededit's 
implementation for handling character sets, but it will hopefully be 
useful for applications in general and if so can be moved to someplace 
more appropriate.  I have checked in a header file in the stylededit 
folder called CharacterSet.h that anyone can look at if interested.

Andrew

Other related posts: