Hi Ankur, Ankur Sethi <get.me.ankur@xxxxxxxxx> wrote: > 1. Indexing and Querying Library: Will perform analysis on the files > and take care of building and querying the search database. > > I have been looking around for already available information > retrieval > libraries. The two major projects I found are CLucene and Xapian. > CLucene is the more popular of the two, but I think building it will > require GCC4. Xapian is under the GPL, so I don't know if that will > be > acceptable. The last option is, of course, writing one from scratch, > which may not be a good idea given the project timeline. The GPL should not be used in libraries, even if it's only used . Have you tried building CLucene, or why do you think it needs GCC4? Even if it needs GCC4, it should not be a show stopper, as only the indexing server will link against this library, right? And the communication with the server is GCC agnostic in any case. > 2. The Indexing Daemon: Will keep the database in sync as files > change > on disk. It's starting to dawn on me that this might be an area that > would require a lot of thought. > > The indexing daemon will have a set of plugins that will convert data > from different file formats (PDF, ODF, DOC etc.) to a format > compatible with the indexing library. That sounds good to me. [...] > Thoughts? Ideas? Opinions? Comments? I'm particularly looking for > insights concerning 1 and 2. Any further specs on the insights you need? :-) Bye, Axel.