[haiku-development] BString and UTF-8

I'm in the process of enhancing the BString class and am having trouble understanding how to make changes to the Jamfiles to support the change. I have read all of Jam the documentation I can find, but I still can't figure out how to make the changes I need. Is there someone who can help me?

I have most of these changes working in a copy of the BString, and I'm trying to build Haiku incorporating this so I can verify that these changes are backwardly compatible.

Some of the things that I need the Jamfiles to do:
- Add additional include directories to the BString compile
- Add additional dependent libraries to the libbe.so link step

I am looking at several possible approaches to providing the functionality in the BString class:
- Have the BString class use ICU directly
- Have the BString class use ICU through the LocaleBackend class
- Have the BString class use a combination of the above

The changes to the BString class fall into several categories:
- Making sure the BString class always holds valid UTF-8 strings (Allowing invalid UTF-8 strings is both a security risk, as well as making operations on existing strings difficult or impossible.) - Making locale-sensitive methods respect the locale (such as case conversion) - Making the "Chars" methods work with all normalization forms of UTF-8 strings (Currently, the "Chars" methods operate on "code points". A Unicode "character" can be one OR MORE code points.) - Adding both "Chars" and "CodePoint" methods, as appropriate, so the BString class has full functionality when used with the Locale Kit classes. - Adding Unicode-character-aware regular expression support for "Find", "Replace", and "Remove" functions (This would allow things such as: "find a word that starts with a case insensitive 'c', has as the third character either a lower case o-umlaut or an upper case 'M', and the word length is between 5 and 7 characters".)

I want to make it very clear that these changes will be backwardly compatible. Of course, by its definition, there will be changes to the behavior of the BString. But for current operations that maintain a valid UTF-8 string, the behavior will not change. The behavior will only be different when the operation would have created an invalid UTF-8 string. Plus, I will add enhanced operations to the class.

In addition to the changes to the BString class, I'm updating the HaikuBook documentation. This will include both API description for the BString class, and a document (to be included in the "overview" section of the HaikuBook) desribing UTF-8 in detail, with examples showing how to write code using the BString in a locale-free manner.

Thanks for any help on this.

Michael Bridgers

Other related posts: