[interfacekit] Re: MAJOR UTF8 bugs ...

I think you're right. I just read in BeBook that, officially, the BString
ONLY support flat 7-bit ascii strings.  So I guess I'll implement it exactly
that way (because it's documented this way).

I finally just think this class suck for the same raison. I remember the
frustrations when I developed localized apps:  on one hand BeOS is 100%
UTF-8  (which is a really good thing), but in the other hand, there's
nothing (really great) in the API to help us managing this format.  Kind of
very bad design.

Ok for now.  R1 BString will be as dumb as the original. But check out in R2
for a "BStringX" or something ...  :-)

- Steve



----- Original Message -----
From: "Erik Jakowatz" <erik@xxxxxxxxxxxxxx>
To: <interfacekit@xxxxxxxxxxxxx>
Sent: Saturday, January 26, 2002 10:19 PM
Subject: [interfacekit] Re: MAJOR UTF8 bugs ...


>
> >Ok, I don't want to annoye everybody with the same question but ...
>
> That's what this list is for. =)
>
> >I'm currently working on the BString class.  And on R5 this class is
> filled
> >with MAJORS bugs about UTF-8 encoded strings. Nothing's wrong when
> using
>
> [snip]
>
> >Here's a simple example of what I mean :
> >
> >BString  string1 = "Steve ";
> >BString  string2 = "Vallée";   // Note the "é" character
> >string1.Append( string2, 6 );
> >printf("%s", string1.String() );
> >
> >will produce ...  "Steve Vallé"    (without the final "e", because
> gthe "é"
> >require 2 bytes)
>
> I'm curious about something.  If you assume that all your counts are in
> bytes, rather than characters (which may be more than one byte), do
> these "bugs" go away?  The reason I ask this is because whether these
> are bugs may largely be a matter of perspective.  Sure, if you expect
>
> string1.Append( string2, 6 );
>
> to be counting six *characters* the behaviour is buggy.  However, if you
> expect that six to mean six *bytes* the behaviour is correct.  Does this
> make sense?  BString's underlying assumption is that all counts are
> bytes, and if one approaches it expecting it to count characters, a lot
> of functionality doesn't work as expected.
>
> >I know I can use simple #ifdef and produce 2 codes for each functions.
> But
> >exactly because of the tremendous size of BString class, I'm not very
> fond
> >to double-job each single methods implementations.
>
> I can certainly understand your feeling here.
>
> >My opinion is: because of those many bugs, I'm 100% sure nobody ever
> used
> >this class in the context of a localized program. It just make not
> sense at
> >all.
>
> I'm of two minds here.  On the one hand, what you are proposing results
> in changing the underlying assumption of how BString works, which is a
> risky thing to do.  On the other hand, who could possibly be using
> BString to *split* multibyte characters?  I mean, what would be the
> utility in that?  It does occur to me, though, that a number of programs
> may be explicitely taking the current behaviour into account when
> dealing with multibyte text, and changing how BString interprets counts
> may really mess those apps up.
>
> Are there other UTF-8 related bugs that don't involve counts?  If there
> are, let us know what they are so we can make an intelligent decision.
> If not, my feeling is that the class should be implemented as-is, with
> more emphasis in the docs that everything but CountChars() deals in
> bytes, not UTF-8 characters, and to be careful lest those multibyte
> characters get chopped. =)
>
> Thoughts, anyone?
>
> e
>
> Data is not information, and information is not knowledge: knowledge is
> not understanding, and understanding is not wisdom.
> - Philip Adams
>


Other related posts: