[haiku-development] Re: BString method SymbolAt() proposal

  • From: Ingo Weinhold <ingo_weinhold@xxxxxx>
  • To: haiku-development@xxxxxxxxxxxxx
  • Date: Sat, 19 Jun 2010 18:24:55 +0200

On 2010-06-19 at 14:18:21 [+0200], Reinhard Scharnagl <ReScharn@xxxxxxx> 
wrote:
> Maybe I have been misunderstood. The routine should not cut the to be 
found 
> i-th symbol, but the symbol at index i.
> 
> A routine part to invert a UTF8 string could look e.g. like:
> 
>     BString original = "Bärenstraße 1 - 200000 €";
>     BString inverted, symbol;
>     int len = original.Length;
>     for (int i = 0; i < len; i += symbol.Length()) {
>         symbol = original.SymbolAt(i);
>         inverted.Prepend(symbol);
>     }

That would work, but IMO the interface is not particularly nice. The 
complexity is O(n), but due to returning a BString you get a memory 
allocation and deallocation for each character, which is not really 
efficient. Furthermore using a byte index into the string, but getting a 
UTF-8 character is not exactly beautiful.

I would rather see a dedicated iteration interface for UTF-8 strings. E.e. 
a very simple one in BUnicodeChar:

        static const char* NextUT8Char(const char*& string,
                size_t& _byteCount);

Could be used like:

        BString reversed;
        const char* cookie = original;
        size_t byteCount;
        while (const char* utf8Char = BUnicodeChar::NextUTF8Char(cookie,
                        byteCount)) {
                reversed.Prepend(utf8Char, byteCount);
                        // BTW: O(n)!
        }

A stateful iterator class, which would be even more convenient to use, 
could be added as well.


On 2010-06-19 at 15:38:05 [+0200], Reinhard Scharnagl <ReScharn@xxxxxxx> 
wrote:
[...]
> And I wonder how the current compiler will handle:
> 
> wchar_t euro = '€';
> 
> and simultaneously still support BString and multibyte message ids.

Wide char/string literals have the prefix "L" (i.e. L'a' or
L"Hello world!").

CU, Ingo

Other related posts: