[openbeos] OpenBFS Progress - Filesystem Grep Replacement

  • From: "Alexander G. M. Smith" <agmsmith@xxxxxxxxxxxxxxxxxxx>
  • To: openbeos@xxxxxxxxxxxxx
  • Date: Mon, 18 Feb 2002 11:46:52 EST (-0500)

Michael Pfeiffer <michael.pfeiffer@xxxxxxxxx> wrote on Mon, 18 Feb 2002 17:15:24
 +0100:
> Now I have two feature requests:
> * querying files under a certain folder (+ sub folders)

That's a bit awkward, since the indices currently apply to the
whole disk.  ReiserFS seems to have a plan where they have indices
in each directory (so your query is a path composed of multiple
queries).  We could do something similar.  Or just filter out the
files that aren't under a particular directory (would be slow,
but good enough for a first attempt).

> * searching in the file data
> 
> This would make "grep" obsolete.

Along those lines, I was thinking of a quick and simple trick for
adding keyword indices as one of my file system experiments.  A new
keyword attribute type (different from the current plain string
attribute type) would automatically break up the attribute data
into separate words and individually add each word to the index.
Of course, if the attribute changes, it removes all the words of
the old attribute value from the index and then adds the words of
the new value.  Word processors and other tools would create this
attribute, making a list of all the unique words used in the file
(or non-unique at a slight performance and space cost to add the
same word to the index extra times).  To save on storing the text
twice (once in the file, once in the attribute), a special
redirection marker attribute could be used instead (but then
every time you slightly change the file, it gets reindexed).

So, doea anyone know which characters are spaces in all alphabets?
What does Chinese/Arabic writing do to separate words, if they
even have words?

It's not the same as grep, but probably useful enough.  For pure
greppiness, a Tri (tree of letters forming words as you traverse
it) or some other different kind of indexing system would be needed.
Perhaps the Patricia trees used in the Oxford dictionary project
at Waterloo?

- Alex



Other related posts:

  • » [openbeos] OpenBFS Progress - Filesystem Grep Replacement