> > The problem is BFS cannot index string attributes larger than 255 > > bytes; so this cannot be used for full content indexing. > > Full text indexing does not work by storing the full content directly > in the index. Each term must be indexed independently, otherwise the > lookup won't benefit much from the index. Of course, repeated terms > are only added once and words that are too common or too short are > ignored. > > This means that the size limit is not the big problem. What is needed > is a mechanism for storing multiple strings in each attribute. Yes but this will need to be added then. > > > Now, I don't see full content indexing as really mandatory. I > > believe > > well weighted keyword extraction should be enough (there is already > > a > > META:keyw attribute defined somewhere, for People files IIRC). > > Keyword and label-type attributes also need multiple strings. Simply > setting META:keyw to "christmas holiday france" is not good enough. > Queries then need to use wildcards, which will give terrible > performance. Generally when searching an index, wildcards at the end > are ok, but prefix wildcards mean that you have to sequentially scan > the whole index. Well, Using wildcards when searching mails (and I have a lot of them) isn't that slow. Of course it's not the perfect way, but at least it works. > > > A "spotlight" like app would then just start several queries at > > once > > and merge relevant results. > > Yes, for each extractor plugin there would be a search plugin. For > instance, the one for mp3 files knows that the attributes for artist, > album, title, year, etc should be included in the query, and that > year > is a number. No need. The mime db already knows what kind of attribute each is for each mime. Besides indexes also are typed. François.