[haiku] Re: Need Some GSoC Advice

  • From: Ingo Weinhold <ingo_weinhold@xxxxxx>
  • To: haiku@xxxxxxxxxxxxx
  • Date: Tue, 24 Mar 2009 16:31:27 +0100

On 2009-03-24 at 15:01:01 [+0100], Ankur Sethi <get.me.ankur@xxxxxxxxx> 
wrote:
> Clarification: "Although I didn't follow most of the discussion about
> the finer points of OpenBeFS ..." from my previous email means I
> didn't understand most of the stuff. Thanks to Matt Madia for pointing
> out this ambiguity.
> 
> > No.
> > Simply because BFS indexes (some of) them.
> > Running a query in the end means reading the indices and only if an
> > attribute isn't indexed reading them directly.
> 
> So far I've been thinking along the lines of a userland process that
> runs in the background waiting for files to change and, when they do,
> performs some analysis on them and updates their entries in a
> database.

That's the way to go, yes.

> So you can get fast indexing simply by improving the
> indexing features already present in BFS?

Not really.

> What about full content indexing? For a 2000 word PDF, it's possible
> to perform some analysis on the data and reduce the amount of content
> that has to be indexed, but it's still a substantial amount of
> information. Can BFS deal with that?

Nope.

> I think I should read up on BFS before I read about IR techniques.
> What would be a good resource? There's a PDF called "Practical File
> System Design with the Be File System" mentioned on Wikipedia. Is it
> useful or an overkill for what I'm trying to do?

Please don't bother with BFS. BFS indices aren't particularly good for the 
kind of queries one would want to run for content searches. It is fine for 
string matches of the form "foo*", but it isn't for "*[fF][oO][oO]*" like 
ones.

CU, Ingo

Other related posts: