Hello all, Since I saw the bugs go out again for stylededit I thought I would look at some of them. In particular I figured it was time to check on the status of encodings support. I'm pleased to announce that the OBOS StyledEdit now has support for loading and saving to any of the encodings supported by libtextencodings.so. (this is a number more encodings than R5 StyledEdit) Some may remember the discussion a while back over the way to do this and I chose to implement it via convert_to/from_utf8, which seemed to be the general consensus. As part of my effort to move towards better support for encodings in general I tried to create an abstraction to manage encodings. Two tasks that I wanted this abstraction to perform in particular were: enumerating the encodings supported, and supplying human readable names for them. In addition to these tasks, I wanted to be able to get the "font id" (suitable for BFont::SetEncoding) for an encoding, and the "conversion id" (suitable for convert_to/from_utf8) for an encoding. I also wanted to step a bit beyond the arbitrary nature of the encodings collection in R5. To do this I consulted the IANA, which manages a list of encodings and various properties of them. (see http://www.iana.org/assignments/character-sets ) The end result is in two files which are currently located in the stylededit directory. These files are CharacterSet.h and CharacterSet.cpp. There are two classes defined here. One is a CharacterSetRoster, which supports enumerating the supported character sets, and finding a character set by various search criteria. (some "find" methods are constant time, others are linear) The other class, CharacterSet, represents an individual character set. The fields in the CharacterSet class are taken from the IANA document listed above. Some nice things that this added to my initial list of requirements: the MIME name for a character set, if it exists, the canonical IANA name for a character set, and a set of known aliases for a character set. It also added something called a MIB enum that I am honestly not to sure about but perhaps would be useful for hard-core publishing type apps or somesuch? (I put it in for completeness.) OBOS StyledEdit now stands as an example of how to use this functionality. I'm hoping that something like this could be made a standard through beunited. However I don't really hope that this particular interface/implementation makes it. :-) Although it works, it may be preferable to support a different enumeration interface than the method I chose at the time. Also, the set of character sets is hard-coded into CharacterSet.cpp. I think it would be superior to have it read from a file. Unfortunately the IANA document is not easily parsable, however it could be manually doctored into a more readable format which could be programmatically read without difficulty. Also, some of the character set assignments for particular conversion ids are guesses. I assume for example that the B_MACINTOSH_ROMAN font encoding is the same as the one denoted by the IANA name "macintosh". Once I set up the above abstraction for managing encodings to my satisfaction, I used it in StyledEdit to provide the encodings menus and read/write capability. StyledEditView::GetStyledText uses the BTranslationUtils::GetStyledText to read a stream into the view. As part of this, it checks for a few attributes: "alignment", "wrap", and "be:encoding". [these are used by R5 StyledEdit] Annoyingly the R5 BTranslationUtils::GetStyledText doesn't handle this itself. This is somewhat understandable since BTranslationUtils::GetStyledText takes a BPositionIO, which doesn't provide the ReadAttr interface. Also annoying is that R5 StyledEdit uses the number 65535 as UTF8 for "be:encoding". Because BTranslationUtils::GetStyledText doesn't handle encodings I ended up reading the file first with BTranslationUtils::GetStyledText and then re-reading the file into a temporary buffer on which I perform the convert_to_utf8. This is lamentable, because it means the file is read twice. It did save me from having to parse the "runs", which would have basically meant copying BTranslationUtils::GetStyledText code directly into StyledEdit, which I felt (at least at that moment) would be even more depressing. Writing is similarly annoying. Using R5's BTranslationUtils::WriteStyledEditFile doesn't write the "alignment", "wrap", or "be:encoding" attributes, for no apparent reason at all. The OBOS implementation does write these. It is my opinion the OBOS version is superior. However, the OBOS StyledEdit currently also performs the writing of these attributes since it is required when linking against R5 libtranslation.so. The OBOS implementation is superior in supporting not only "alignment" and "wrap" but also "be:encoding". However, the "encoding" of a BTextView is not an existing field. There is a BTextView::SetAlignment, but no BTextView::SetEncoding, for example. And so OBOS BTranslationUtils::WriteStyledEditFile simply writes 65535 for "be:encoding". And similar to reading, I have performed a lamentable second write, which zeroes the file contents and refills them with converted text. The depressing alternative again is virtually equivalent to copying out the BTranslationUtils::WriteStyledEditFile implementation and modifying it. I should point out that the implementation for performing the conversion is a little bit tricky and if someone decides to relocate this functionality I would highly recommend observing and moving the code that I have written for performing this conversion in a 256-character buffer. It seems to properly convert the characters even when they are broken across the 256-character boundary, which was something that took some thinking and testing. Reading in particular was fairly annoying. Warning: potential R2 issue + opinion ahead. :-) In my opinion this functionality probably should be relocated, and moved into existing BTranslationUtils functions or new functions, possibly accepting BFile instead of BPositionIO. BTextView should be expanded to support SetEncoding/Encoding. One last disappointing observation on R5 StyledEdit vs. OBOS StyledEdit: The R5 StyledEdit is able to open a UTF8 file that I have created in it, without issue. However, the OBOS StyledEdit can not open this file. The failure is not in the StyledEdit code. The failure occurs when BTextView::GetStyledText is called on the file. It returns B_TRANSLATION_ERROR_BASE. This occurs when calling the R5 version of GetStyledText. My conclusion from this is that the R5 StyledEdit implementation does not even use BTextView::GetStyledText to populate the view. IMHO this is quite unfortunate. Hopefully the OBOS version of GetStyledText will be able to do the right thing. My guess is that the R5 version of GetStyledText fails because it uses the STXTTranslator, which decides that the input file is not a text file. (the input file is text, it is chinese) R5 StyledEdit doesn't use GetStyledText at all and doesn't try to make a determination. And a mystery for those interested: the outstanding bug on viewing files with dos newlines (they get doubled) has left me baffled. The R5 StyledEdit handles not only dos newlines, but apparently any combination of them with unix newlines. The newlines bytes have not been removed or replaced, as they can be cut and pasted, and even saved. It's almost as if the display routine understands these different newlines. I thought to override CanEndLine in BTextView, but it seems that is only used when wrapping is on, but these lines work properly in R5 StyledEdit. I'm at a loss. In the meantime, please try the OBOS StyledEdit encoding features and let me know if you have any problems. Andrew