[yunqa.de] Re: DiSqlite - How to get all snippets if more than one occurrence?

  • From: Delphi Inspiration <delphi@xxxxxxxx>
  • To: yunqa@xxxxxxxxxxxxx
  • Date: Tue, 16 Mar 2010 13:28:06 +0100

At 12:59 16.03.2010, Edwin Yip wrote:

>Thank you for the comments. Two derived questions then, if I could: 
>
>* It seems that the offsets function returns information about the bytes, 
>only, but not characters? How to handle UNICDOE then? For example, the 
>description of the 3rd integer of the return value of offsets function: "The 
>byte offset of the matching term within the column.".

The DISQLite3_Full_Text_Search demo needs character offsets as well and has a 
converter function in DISQLite3_Full_Text_Search_Form_Main.pas:

{ Converts an Offset string to an array of TOffsetInfo. Takes care to convert
  UTF-8 byte positions to WideString / UnicodeString character indexes.
  Therfore, the corresponding Content string must be passed as well. }
function DecodeOffsets(
  const Offsets: Utf8String;
  const Content: UnicodeString): TOffsetInfoArray;

>When manually extracting the snippets from a large string (thousands of 
>characters), what's the most efficient way? especially that I need to also 
>insert the <b></b> around the matching keywords? Do you have a library that 
>make these kind of things easier? You know, you have a bunch of libraries ;) 

No, I do not have a library for everything ;-) For my custom snippets() 
function, I would start off with simple string concatenation and then optimize 
later.

Be aware that a good snippet extractor is not as simple as it might seem: FTS3 
goes quite a way to extract both relevant and "good looking" snippets.

Ralf  

_______________________________________________
Delphi Inspiration mailing list
yunqa@xxxxxxxxxxxxx
//www.freelists.org/list/yunqa



Other related posts: