Hi, I have spent the past week playing and experimenting with the Haiku and CLucene APIs. I'm starting work on the indexing daemon. I had an email discussion with Rene, and he says I should discuss these few issues on the ML. 1. To what extent can timestamps on files be trusted? What happens when the user tinkers with the system time? 2. Writing data translators to extract text from PDF, ODF etc. seems like a nice idea. That way, other apps may also benefit from the code. Would it be a good permanent solution or should the indexing daemon implement its own plugin API? 3. To store the indices, the daemon will create a folder called .index on every volume it indexes. This way, old indices are not lost when the user reinstalls Haiku and multiple Haiku installations on a single computer can use the same indices. I hope this is acceptable? 4. I feel it's best if we do not index removable media by default. In case the user does want to index his removable devices, the indices for those go in /boot/home/config/index/. So, no polluting the USB devices with junk. 5. Rene thinks storing all indices in /boot/home/config/index/ should be fine, regardless of whether the volume is removable or not. Would this be a better option? 6. Indexing 100KB of data from any file should be more than enough. 250KB tops. Thoughts? I indexed about 650megs of Project Gutenberg texts using CLucene, indexing the entire files in the first test run and indexing only the first 100KB in the second test run. In both cases, the only fields I added to the index for each file were the file contents and path. In the first case, the index was more or less the same size as the indexed content (as expected) but once I add a few extra fields to the index, the index will grow much larger than the content it indexes. In the second case, the index was just 85MB. The quality of search results in both cases was more or less the same. The point is that indexing entire files will needlessly fill up the HDD and that indexing even 100KB of text is good enough in practice. (BTW, the project is called Beacon :) ) -- Ankur Sethi (GeneralMaximus)