[haiku] Re: Need Some GSoC Advice

  • From: "Jonas Sundström" <jonas@xxxxxxxxxxx>
  • To: haiku@xxxxxxxxxxxxx
  • Date: Tue, 24 Mar 2009 09:50:12 +0100 CET

"Cyan" <cyanh256@xxxxxxxxxxxx> wrote:
 ...
> What about introducing "private" or "internal"
> attributes: attributes with a bit set to indicate
> they shouldn't normally be exposed for
> viewing/editing in the same manner as user-attached
> attributes? (except determined users who use 
> QuickRes, etc., but they deserve everything they get)

Isn´t that sort of how we already have it - files being
able to have any set of attributes but Tracker only 
showing the attributes set to be visible and/or editable.

Even if they could be hidden deeper, there would be ways
to change them and to introduce inconsistencies between
file contents and file metadata.
 
> I've often wished for a similar thing, because there seems to be
> quite a divide between how attributes are used. On one hand there
> are applications that attach all sorts of attributes to files,
> e.g. last window position, zoom settings, StyledEdit formatting
> data, etc. But on the other hand, attributes are used by the user
> for deliberately attaching to files -- and they might be surprised
> to see countless attributes they don't recognize attached to a file.

What attributes you´re exposed to ultimately depends
on the editor you are using, whether or not it makes
the distinction Tracker does. Might be a good idea to
hold up that convention in a casual attribute editor.

> Maybe simply using binary data types for "internal"
> attributes would be sufficient to stop careless editing?

Probably.
  
> > The problem is BFS cannot index string attributes
> > larger than 255 bytes; so this cannot be used for
> > full content indexing.
> 
> Ouch, as I suspected then... is this a fundamental
> limit that can't be lifted, even by breaking on-disk
> compatibility?
> 
> If it is liftable, I think it should be done -- 
> full-text indexing is the future, 

There is another way. You can store full-text in a 
different file and link those two files by giving
them a unique, shared attribute value. For example: 

"HAIKU:TEXTUAL_CONTENTS" (large integer value, indexed)

File "GreatDocument" has the attribute with value 
123456789. Somewhere out of sight, skipped when searching,
there is a text file named whatever, containing the
textual extracts of GreatDocument, sharing the 123456789
attribute value.

So, Hai-full-content-search (Tracker?) would simply grep
through its textual extract files (stored out of sight)
for a match, check the attribute value, do a query on that
attribute, which is indexed of course, and find the file, 
in this case GreatDocument.

This would be about as robust as storing metadata 
directly in attribute on the files, on BFS. It would
enable larger storage capacities, fast searches and
it would scale much better, not overloading BFS as
much as textual indices would.

Non-BFS filesystems, either you exclude those, rely on
an attribute overlay, or provide an alternative mechanism.
Nothing lost really, but much gained.

Original file timestamp attributes (or checksum) on the
textual extract files, would ensure that the extract is
still relevant.

About finding a unique integer value, either you rely on
very large random numbers, or you query for the largest
used value (binary search based on last known largest?).

To weed out out other peoples spotlight attributes
(when filesharing) there could be a system-id, 
similar in purpose to the Deskbar security code.

/Jonas.


Other related posts: