[yunqa.de] Re: Reading UTF-8

  • From: Delphi Inspiration <delphi@xxxxxxxx>
  • To: yunqa@xxxxxxxxxxxxx
  • Date: Wed, 14 May 2008 16:47:25 +0200

Simon Beesley wrote:

>How do I get UTF-8 data as UTF-8?
>
>Previously (with Zeos and a SQLite3 database):
>
>   Ansistringvariable := Fields[1].AsString;
>
>returned a UTF-8 string.
>
>With DISQLite3 and the same SQLite3 database:
>
> Ansistringvariable :=  FStatement.Column_Str(1);

Theses methods all return UTF-8 encoded text:

  TDISQLite3Statement.Column_Str
  TDISQLite3Statement.Column_Text

whereas these return WideString text:

  TDISQLite3Statement.Column_Str16
  TDISQLite3Statement.Column_Text16

So the simple rule is: If a function ends in ..16, it takes WideString 
parameters, if it does not, they should be UTF-8 encoded.

>returns an Ansi string with the UTF-8 encoded foreign characters stripped out 
>or converted (e.g. the Turkish character 's' with a cedilla now becomes plain 
>'s').

I am not sure how you stored you text using Zeos, but I know that SQLite does 
*not* validate the text encoding, it just assumes the caller passes correctly 
encoded text.

This means that you can store text in *any* encoding using Bind_Str! If you do 
so, Column_Str will return text in just the same encoding as stored. Could it 
be possible that your application mixed up encodings with Zeos, and now you are 
seeing awkward results because DISQLite does not mix them back?

If you are now tempted to store non UTF-8 encodings via Bind_Str, you should be 
aware of some pitfalls:

* With custom encodings, your database will not be compatible with other SQLite 
applications and tools.

* With custom encodings, SQL text functions like Replace() or nocase sorting 
will no longer work properly unless you overwrite the relevant functions to 
support your custom encoding.

* With custom encodings, the Unicode capabilities of your application might be 
limited.

I storngly suggest you to Unicode, simply because it is the future. You can 
happily mix both UTF-8 and WideString calls (DISQLite3 sorts them out 
internally), but using just one consistently can avoid confusion:

* Use UTF-8 encoded text only with Bind_Str and Column_Str. This is often 
faster for European languages, but requirs careful coding because UTF-8 is not 
a distinct string type in Delphi.

* If you do not want to worry about encodings, use WideStrings and the ..16 
functions throughout.

If you still experience problems with your database, you can send it to my 
e-mail directly (compress, please!) and I will take a look.

Ralf 

_______________________________________________
Delphi Inspiration mailing list
yunqa@xxxxxxxxxxxxx
//www.freelists.org/list/yunqa



Other related posts: