[yunqa.de] Re: DISQLite3: String spoiled during extraction.

  • From: Delphi Inspiration <delphi@xxxxxxxx>
  • To: yunqa@xxxxxxxxxxxxx
  • Date: Wed, 25 Aug 2010 09:35:56 +0200

On 24.08.2010 17:14, Alex Vulpe wrote:

> There is a reason why I was saving Unicode data as AnsiStrings with 
> Bind_Str().
> I have a 2 tier application.
> On server side, when I use Bind_Str16(), sqlite converts from Unicode
> to UTF8, i.e. string is saved as UTF8.
> On the client side, when I use Column_Str16() - it converts back from
> UTF8 to Unicode.
> On big amount of data it takes time. I have tried to avoid this
> Unicode -> UTF8 -> Unicode conversions by saving data as pseudo
> AnsiStrings.

Are you worried that the conversion slows down performance? In my
experience, this is mostly negligible. Hard drive speed is the
bottleneck most DB operations, so since UTF-8 is generally more compact
for European languages, it creates smaller database files and might
actually run faster.

Also, be sure to wrap large inserts in a transaction for speed. See
these demos for examples:

  \Demos\DISQLite3_20_Million\
  \Demos\DISQLite3_Log_Inserts\
  \Demos\DISQLite3_Create_Table

> If there is another option to avoid this unnecessary conversion - I'm
> glad to hear it :)

Yes, there are several options to avoid the conversion:


1. UTF-16 database.

Delphi's UnicodeString / WideString is UTF-16LE, so conversion will not
occur if you write to or read from a UTF-16LE database with the ..._16()
API. You set the database encoding by issuing this SQL as the first
command right after you created the database file (DISQLite3 Pro only):

  PRAGMA encoding = "UTF-16le";

This will result in a database file where all text is stored as
UTF-16LE. Both the UTF-8 and UTF-16 APIs still work equally well, but
now conversion happens for the UTF-8 API like Column_Text() and no
longer for Column_Text16().

Please note that UTF-16 database files will likely be larger than UTF-8
ones for most European languages. More information is available at the
DISQLite3 Help -> SQL Syntax -> PRAGMA section.


2. Store strings as BLOBs

You could store strings as raw byte sequences using the Bind_Blob() and
Column_Blob() interfaces. While this can work for a particular user
scenario, it has certain compatibility limitations (some built-in SQL
functions will not work properly, for example) and should be used by
experienced developers only.


It is hard to tell which approach works best for you. You might want to
run a few measurements with different encoding / conversion settings on
your particular data and hardware. I'd be interested to read your results.

Ralf
_______________________________________________
Delphi Inspiration mailing list
yunqa@xxxxxxxxxxxxx
//www.freelists.org/list/yunqa



Other related posts: