[haiku] Re: Need Some GSoC Advice

  • From: "François Revol" <revol@xxxxxxx>
  • To: haiku@xxxxxxxxxxxxx
  • Date: Tue, 24 Mar 2009 15:20:23 +0100 CET

> Clarification: "Although I didn't follow most of the discussion about
> the finer points of OpenBeFS ..." from my previous email means I
> didn't understand most of the stuff. Thanks to Matt Madia for 
> pointing
> out this ambiguity.
> 
> > No.
> > Simply because BFS indexes (some of) them.
> > Running a query in the end means reading the indices and only if an
> > attribute isn't indexed reading them directly.
> 
> So far I've been thinking along the lines of a userland process that
> runs in the background waiting for files to change and, when they do,
> performs some analysis on them and updates their entries in a
> database. So you can get fast indexing simply by improving the
> indexing features already present in BFS?

BFS already indexes attributes when told to.
The missing part is indeed the tool/server/whatever to feed/fill those 
attrs from the metadata inside the file.

> What about full content indexing? For a 2000 word PDF, it's possible
> to perform some analysis on the data and reduce the amount of content
> that has to be indexed, but it's still a substantial amount of
> information. Can BFS deal with that?

No, not as is, and it's what we were discussing.

> I think I should read up on BFS before I read about IR techniques.
> What would be a good resource? There's a PDF called "Practical File
> System Design with the Be File System" mentioned on Wikipedia. Is it
> useful or an overkill for what I'm trying to do?

Yes this book has been written by Dominic Giampaolo who designed BFS, 
and now works at Apple (Spotlight group, the world is small...).

You can easily skip the parts about physical layout and other fs.

François.

Other related posts: