[haiku] Re: Need Some GSoC Advice

  • From: "Cyan" <cyanh256@xxxxxxxxxxxx>
  • To: haiku@xxxxxxxxxxxxx
  • Date: Tue, 24 Mar 2009 04:29:05 GMT

"François Revol" <revol@xxxxxxx> wrote:
> Another problem is when someone manually fixes the attributes
> (cause they are wrong, or to add more info), reindexing them would
> loose the changes.

What about introducing "private" or "internal" attributes: attributes
with a bit set to indicate they shouldn't normally be exposed for
viewing/editing in the same manner as user-attached attributes?
(except determined users who use QuickRes, etc., but they deserve
everything they get)

I've often wished for a similar thing, because there seems to be
quite a divide between how attributes are used. On one hand there
are applications that attach all sorts of attributes to files,
e.g. last window position, zoom settings, StyledEdit formatting
data, etc. But on the other hand, attributes are used by the user
for deliberately attaching to files -- and they might be surprised
to see countless attributes they don't recognize attached to a file.

Maybe simply using binary data types for "internal" attributes would
be sufficient to stop careless editing?

> The problem is BFS cannot index string attributes larger than 255 
> bytes; so this cannot be used for full content indexing.

Ouch, as I suspected then... is this a fundamental limit that can't
be lifted, even by breaking on-disk compatibility?

If it is liftable, I think it should be done -- full-text indexing
is the future, and if the user wants that enabled, they should format
the drive in the new format. Read+write should be retained for the
current version of BFS for those who want to use their existing BeOS
partitions (albeit without searching inside files).

If it's not, I think that's going to be quite a pain. People differ
in how they use search engines, but I've often retrieved things on
the Web by searching for certain memorable sentences, and I've seen
other people do the same. If this is a common approach (I've really
no idea -- I'm just speaking from my experience here, and I hope
others give theirs too!) then a keyword indexer would fall well short
of expectations.

That said, keyword indexing would be better than nothing  -- it just
doesn't strike me as very future-proof. However it is quite
backward-compatible -- I'm using a keyword attribute already for
some files, though to be honest I'm usually too lazy to fill it in.
An indexer would help immensely with that, but it would need to
co-operate with user-edited keywords. For instance, have a
"HAIKU:keywords" attribute which is maintained by the indexer, and
a user-generated "keywords" attribute which is maintained by the
user. Run queries on both attributes, so both indexed attributes and
user-added attributes are picked up. Also use an
Altavista/Google-style exclusion mechanism, whereby if a keyword is
added to the user "keywords" attribute with a "-" at the start, it
is ignored in the automatically-indexed list. Thus allowing the user
to both add and remove keywords from the OS-generated attribute
without actually editing it directly.

Just some random thoughts anyway...

Other related posts: