[openbeos] Re: StyledEdit and character set encodings

  • From: Guy <mul_m7m@xxxxxxxxxxxx>
  • To: openbeos@xxxxxxxxxxxxx
  • Date: Thu, 21 Nov 2002 19:40:48 +0200

hey all,

i am sorry if im interrupting the discussion as i am not very good with encoding issues, but i wonder if this is a good time to mention BiDi text and hebrew to be more specific.
beos has a problem of being able to cope with hebrew, yet display all hebrew always reversed (unless its a beos-written doc, meaning styled edit doc written with the Hebrew keymap extention from bebits).
all filenames are displayed but always reversed, same goes with .Doc Word files (with gobe 1,2) and regular text with StyledEdit and such.


i was talking to BGA a while ago about trying to detect hebrew filenames just in-order to reverse their names in OpenTracker, but i wonder if there is something that is related to the whole styledit Enconding issues that could explain this better and perhaps assist me in getting that hebrew to work. (hebrew is a *major* obsticle for every israeli attempting to move to anything different than Wind**s or MacOS.)

i'll be happy to try and assist in coding such stuff.. but i do not understand all that at the moment...

anyone? :)

Guy.

---------------------------------------------------------------------------- --------------

On Wed, 20 Nov 2002 17:38:42 PST, shatty <shatty@xxxxxxxxxxxxx> wrote:

Hello all,

First, some good news. StyledEdit received some updates today, mostly relating to handling files provided at the command line. (thanks to BGA for bringing up the issue) I think it's fair to say that OBOS StyledEdit is now better than the R5 one in almost every respect.

I've still been working on/thinking about the issue of character set encodings. After some further reading I discovered that BTextView has a caveat in its implementation. StyledEdit uses BTextView to provide the majority of the text editing functionality. The caveat is that BTextView is set up to only handle UTF-8 text.

So, if I implement the BFont::SetEncoding solution that I had pretty much decided on earlier we would have no guarantees that it would work properly with the BTextView. Also, I have come to the realization that the original StyledEdit used the convert_to_utf8/convert_from_utf8 solution. So, if we move to that we are still providing the same level of functionality.

Although this approach still has the drawback that files that are loaded and re-saved may not be binary identical after no changes, I think this is acceptible. This is acceptible because we still provide a solution which preserves identicality (use UTF-8) and I suspect that cases where it will not be binary identical are minimal. (these are hopefully rare) Also, it may be the case that after the first load-
save cycle, on each subsequent load-save cycle, binary identicality will be preserved.


In the end, this could encourage people to use UTF-8 for their file format which I think would generally a good thing. It will also simplify the development of inter-application communication as it is unreasonable to expect that all applications on BeOS will be able to handle any character set encoding. (although they still can if they like)

If someone has a specialized situation where they require working in non- UTF- 8 files they can still do so and will probably never experience a problem with it. If a situation comes up where this is a problem we can either address it or it will be an opportunity for a third-party to address it through commercial software if it is worth it for them to do.

As part of my work on the character set issue I have been developing a registry for character sets that is capable of providing IDs that can be used to select a character set for BFont::SetEncoding or for convert_to_utf8/convert_from_utf8. This registry will also provide human readable names and other standard names that I have acquired from a list at the IANA. [ http://www.iana.org/assignments/character-sets ] It will also allow retrieving a character set by using any of the standard names or aliases for the character set, or enumerating all known character sets.

Initially this registry will simply be part of stylededit's implementation for handling character sets, but it will hopefully be useful for applications in general and if so can be moved to someplace more appropriate. I have checked in a header file in the stylededit folder called CharacterSet.h that anyone can look at if interested.

Andrew





--



Other related posts: