[yunqa.de] Re: full text search and stop words

  • From: Delphi Inspiration <delphi@xxxxxxxx>
  • To: yunqa@xxxxxxxxxxxxx
  • Date: Tue, 17 Feb 2009 10:55:00 +0100

Edwin Yip wrote:

>Would anybody give me some hints/tips on removing stop words before indexing 
>with FTS3 in DISQLITE? That will significantly reduce the size of the 
>database. 

You can implement stop words via your own FTS3 tokenizer. The 
DISQLite3_Full_Text_Search demo inclues with a "pascal" tokenizer which shows 
the principles (DISQLite3PascalTokenizer.pas).

The word filtering would go into the Tsqlite3_tokenizer_module.xNext function 
(see pascal_tokenizer_Next). If a text token is in the list of stop words, 
xNext would continue to look for the further tokens and return to the caller 
only if the token is not a stop word.

Unfortunately I do not have example code ready, but you are welcome to post 
yours here for discussion.

Ralf 

_______________________________________________
Delphi Inspiration mailing list
yunqa@xxxxxxxxxxxxx
//www.freelists.org/list/yunqa



Other related posts: