Humdinger <humdingerb@xxxxxxxxxxxxxx> wrote: > > A correct read/write implementation would have to update the > > indexes regardless. > IC. Then let's hope for correct implementations. :) Would it be technically feasible (with modifications to the file system driver if necessary) to perform a query for all files which *lack* a particular attribute? That could possibly work around problems like this -- allowing identification of files which don't yet have the index data attributes at all. (present but outdated index data could be caught using a version number attribute) It would also be useful in the case of adding checksum attributes to each file for data integrity checking -- currently I'm doing this by grinding through all ~1 million files on the disk and checking for the presence of the attribute. Needless to say, the procedure is not particularly quick. As for the indexer itself, it sounds like a great idea! I think the index data would need to be stored as BFS attributes as others have suggested -- a separate database would be a kludge IMO, and likely to lead to sync problems. The only problem that comes to mind is the 256-byte limitation for indexed attributes on BFS -- does this also exist in OpenBFS? If not (or if it's practical to remove), maybe the entire raw text of non-text documents could be stored as a separate BFS-indexed "raw_text" attribute? Then a query for something like "((mime_type == text/plain) && (file_data [contains] search_string)) || ((mime_type == *) && (raw_text [contains] search_string))" would perform the necessary search. Also another thing that comes to mind, how to handle files for which the MIME type is not yet set? I suppose the sniffer should just run on the file before deferring the parsing task to the necessary translation kit plug-in? (!!Feature creep alert!!) It would also be nice if the indexer could index data of different "types" (but still possibly represented as text). We already need to differentiate other types of text data (e.g., ID3 tag fields from raw text of a document) to allow the user to search for e.g. a particular song name as well as a general string. However if the indexer is flexible enough, maybe it could also handle other types of data, such as MIDI note data and tempo derived using a pitch detection algorithm from MP3 files. This would enable the user to recall a particular song by whistling the tune into the search application, or it could allow a music player to better assemble an automatic playlist by matching tempos, etc. Images could also get the same treatment: machine vision could be used to identify the class of image (drawing, photograph, etc.), search for images similar to a reference image (possibly by deriving a text-based "fingerprint" from the image), or search for text obtained using OCR. Needless to say, all these features are creeping to the extreme, but maybe it wouldn't be too much effort to make an indexer which could be extended by third parties using Translation kit plug-ins in similarly exotic directions?