[haiku-gsoc] Re: Full Text Search and Indexing -- Looking for opinions/comments

  • From: "Axel Dörfler" <axeld@xxxxxxxxxxxxxxxx>
  • To: haiku-gsoc@xxxxxxxxxxxxx
  • Date: Sat, 23 May 2009 20:52:03 +0200 CEST

Hi Ankur,

Ankur Sethi <get.me.ankur@xxxxxxxxx> wrote:
> 1. Indexing and Querying Library: Will perform analysis on the files
> and take care of building and querying the search database.
> 
> I have been looking around for already available information 
> retrieval
> libraries. The two major projects I found are CLucene and Xapian.
> CLucene is the more popular of the two, but I think building it will
> require GCC4. Xapian is under the GPL, so I don't know if that will 
> be
> acceptable. The last option is, of course, writing one from scratch,
> which may not be a good idea given the project timeline.

The GPL should not be used in libraries, even if it's only used . Have 
you tried building CLucene, or why do you think it needs GCC4? Even if 
it needs GCC4, it should not be a show stopper, as only the indexing 
server will link against this library, right?
And the communication with the server is GCC agnostic in any case.

> 2. The Indexing Daemon: Will keep the database in sync as files 
> change
> on disk. It's starting to dawn on me that this might be an area that
> would require a lot of thought.
> 
> The indexing daemon will have a set of plugins that will convert data
> from different file formats (PDF, ODF, DOC etc.) to a format
> compatible with the indexing library.

That sounds good to me.

[...]
> Thoughts? Ideas? Opinions? Comments? I'm particularly looking for
> insights concerning 1 and 2.

Any further specs on the insights you need? :-)

Bye,
   Axel.


Other related posts: