[yunqa.de] Re: DISQLite3: String spoiled during extraction.

  • From: Delphi Inspiration <delphi@xxxxxxxx>
  • To: yunqa@xxxxxxxxxxxxx
  • Date: Tue, 24 Aug 2010 15:35:37 +0200

You are using a few things out of context. Let me explain inline below.

On 23.08.2010 18:04, Alex Vulpe wrote:

> I'm saving into database the following string (hex codes): 41 00 41 00 41 00 
> 19 04 26 04 23 04 0C 01 60 01 7D 01 20 00 FC FE
> It's an unicode string, saved with TDISQLite3Statement.Bind_Str() function.

You did not say which Delphi version you are using, so I assume your string is 
an AnsiString. It is quite unusual to use an AnsiString for UnicodeString 
storage, but since Delphi allows it, let's assume that this is OK for now.

Next you use TDISQLite3Statement.Bind_Str() to pass this Unicode-As-AnsiString 
to the database engine. This is not OK, because Bind_Str() wants a Utf8String. 
This is a clear type mismatch, and D2010 should warn you about it.

But how does DISQLite3 handle this, since your string is neither a true 
AnsiString nor UnicodeString? It trusts the application and assumes that the 
string contains valid UTF-8 sequences and saves it to the database verbatim.

> The data is saved OK into file (I checked the binary contents of the file).

Correct. You can also check this using SQL:

  select hex(ss) from test;

> In other application I'm trying to read the string with 
> TDISQLite3Statement.Column_Str() function, and the last 2 bytes (FC FE) of 
> string are spoiled.

The above also applies to TDISQLite3Statement.Column_Str(): It returns an 
Utf8String. DISQLite3 now trusts the database that it contains valid UTF-8 
sequences only and returns the string unchanged. (Databases can also be in 
PRAGMA encoding = "UTF-16"; 
mode, but this is another story.)

> I'm getting the following data: 41 00 41 00 41 00 19 04 26 04 23 04 0C 01 60 
> 01 7D 01 20 00 EF BF BD EF BF BD.

This happens because you call 

  FStmt.Column_Cell16(0, FCell16);

Column_Cell16() asks to retrieve the string just stored as UTF-8 and convert it 
to UTF-16. This conversion takes place in internal memory. However, since your 
byte sequence is NOT valid UTF-8, the conversion produces a wrong UTF-16 string 
and stores it to internal memory.

Now when you next ask for the UTF-8 string, the "wrong" UTF-16 string is 
converted to UTF-8 and Column_Str() returns a valid (in terms of UTF-8) but 
wrong (in terms of input-output characters) result.

> Some additional details: if you comment line 70:
>
> FStmt.Column_Cell16(0, FCell16);
>
> it works fine.

TDISQLite3Statement obviously hides some of the SQLite API internals. So to 
understand the real cause of the problem, you need to know that 

  TDISQLite3Statement.Column_Cell16()  calls  sqlite3_column_text16()

and

  TDISQLite3Statement.Column_Str()  calls  sqlite3_column_text()

Keeping this in mind, we can understand the true meaning of this section of the 
DISQLite3 sqlite3_column_blob() documentation (linked from :

"Do not mix calls to sqlite3_column_text() or sqlite3_column_blob() with calls 
to sqlite3_column_bytes16(), and do not mix calls to sqlite3_column_text16() 
with calls to sqlite3_column_bytes()."

Solutions:

1. Recommended:

Compose your strings as real UTF-8 and use Bind_Str() or as true UTF-16 / 
UnicodeString / WideString and use Bind_Str16().

2. Discouraged:

Do not mix Column_Cell16() or Column_Str16() with Column_Str() as this will 
corrupt your invalid UTF-8 strings. This might still work for your application, 
but be warned that if you save invalid UTF-8 strings to your database, some of 
the build-in SQL functions like length(), substr(), etc. may return wrong 
results.

Ralf
_______________________________________________
Delphi Inspiration mailing list
yunqa@xxxxxxxxxxxxx
//www.freelists.org/list/yunqa



Other related posts: