[haiku-gsoc] Full Text Search and Indexing -- Looking for opinions/comments

  • From: Ankur Sethi <get.me.ankur@xxxxxxxxx>
  • To: haiku-gsoc@xxxxxxxxxxxxx
  • Date: Sat, 23 May 2009 19:03:44 +0530

Hi,

My finals end on May 28. It's about time I make a few final decisions
about my HCD2009 project (full text indexing and search tool).
Originally, this email was part of a conversation between me and Rene.

These are going to be the major parts of my project. Let me know if
I'm missing something or if something should be done differently.

1. Indexing and Querying Library: Will perform analysis on the files
and take care of building and querying the search database.

I have been looking around for already available information retrieval
libraries. The two major projects I found are CLucene and Xapian.
CLucene is the more popular of the two, but I think building it will
require GCC4. Xapian is under the GPL, so I don't know if that will be
acceptable. The last option is, of course, writing one from scratch,
which may not be a good idea given the project timeline.

2. The Indexing Daemon: Will keep the database in sync as files change
on disk. It's starting to dawn on me that this might be an area that
would require a lot of thought.

The indexing daemon will have a set of plugins that will convert data
from different file formats (PDF, ODF, DOC etc.) to a format
compatible with the indexing library.

3. A Set of Commandine Tools: Refreshing the database, querying the
database from the commandline, forcing updates etc.

4. The GUI Front End: Full text indexing functionality will be
integrated into the already existing search UI. I was also thinking of
a very minimal UI that could be quickly brought up/dismissed with a
simple keystroke.

Thoughts? Ideas? Opinions? Comments? I'm particularly looking for
insights concerning 1 and 2.

-- 
Ankur Sethi
http://uncool.in

Other related posts: