[haiku-gsoc] Re: [hcd09] Assorted Questions About the Indexing Daemon

> 1. To what extent can timestamps on files be trusted? What happens
> when the user tinkers with the system time?

Generally those aren't touched much, they cannot be changed from
Tracker IIRC, except by regular copy/move operations.
However things like unzipping, copy, ... usually preserve those
timestamps, so files get created and then get their creation/... times
changed back to that of the original.

> 2. Writing data translators to extract text from PDF, ODF etc. seems
> like a nice idea. That way, other apps may also benefit from the
> code.
> Would it be a good permanent solution or should the indexing daemon
> implement its own plugin API?

IIRC Axel wanted to redesign the Translation Kit, but that probably
won't be for the next months.

You could try something like attached, I added a fag to the
BaseTranslator class Identify() to tell it to extract meta data by
calling a new virtual method.

> 3. To store the indices, the daemon will create a folder called
> .index
> on every volume it indexes. This way, old indices are not lost when
> the user reinstalls Haiku and multiple Haiku installations on a
> single
> computer can use the same indices. I hope this is acceptable?

Should fit the Unix model of dotfiles, but maybe we'll find a better
name.
Just #define it at some single place :)

> 4. I feel it's best if we do not index removable media by default. In
> case the user does want to index his removable devices, the indices
> for those go in /boot/home/config/index/. So, no polluting the USB
> devices with junk.

OTH he might not want to waste cpu on another machine again ?
(like, I had this usb key on my main computer, and plug it in a battery
powered eeePC...)

> 5. Rene thinks storing all indices in /boot/home/config/index/ should
> be fine, regardless of whether the volume is removable or not. Would
> this be a better option?

Not sure.

> 6. Indexing 100KB of data from any file should be more than enough.
> 250KB tops. Thoughts?

Well the idea was to do full indexing, but I suppose we don't want to
act like windows and fill up ones drive :)

François.

Index: src/add-ons/translators/shared/BaseTranslator.cpp
===================================================================
--- src/add-ons/translators/shared/BaseTranslator.cpp   (revision 31035)
+++ src/add-ons/translators/shared/BaseTranslator.cpp   (working copy)
@@ -32,8 +32,16 @@
 
 #include <string.h>
 #include <stdio.h>
+#include <Bitmap.h>
+#include <BitmapStream.h>
+#include <String.h>
+
 #include "BaseTranslator.h"
 
+char B_TRANSLATOR_EXT_METADATA_ONLY[] = "/metadata";
+char B_TRANSLATOR_EXT_METADATA[] = "metadata";
+
+
 // ---------------------------------------------------------------
 // Constructor
 //
@@ -448,15 +456,39 @@
        const translation_format *inFormat, BMessage *ioExtension,
        translator_info *outInfo, uint32 outType)
 {
+       status_t status;
+
        switch (fTranGroup) {
                case B_TRANSLATOR_BITMAP:
-                       return BitsIdentify(inSource, inFormat, ioExtension,
+                       status = BitsIdentify(inSource, inFormat, ioExtension,
                                outInfo, outType);
-                       
+                       break;
+
                default:
-                       return DerivedIdentify(inSource, inFormat, ioExtension,
+                       status = DerivedIdentify(inSource, inFormat, 
ioExtension,
                                outInfo, outType);
        }
+
+       // check if we are asked to extract meta data from the source
+       // if not just return now
+       if ((status < B_OK) || !ioExtension)
+               return status;
+       bool extract = false;
+       if (ioExtension->FindBool(B_TRANSLATOR_EXT_METADATA_ONLY, 0, &extract) <
+               B_OK || !extract)
+               return status;
+       return status;
+
+       BMessage metadata('meta');
+       status = IndexMetaData(inSource, outInfo, ioExtension, outType, 
&metadata);
+       
+       if (!metadata.IsEmpty())
+               ioExtension->AddMessage(B_TRANSLATOR_EXT_METADATA, &metadata);
+
+       if (status < B_OK)
+               return B_OK;
+
+       return B_OK;
 }
 
 // ---------------------------------------------------------------
@@ -683,6 +715,77 @@
        return NULL;
 }
 
+status_t
+BaseTranslator::IndexMetaData(BPositionIO *inSource,
+       const translator_info *inInfo, BMessage *ioExtension,
+       uint32 outType, BMessage *outMetaData, float howMuch)
+{
+       status_t status = B_NO_TRANSLATOR;
+
+       // > 0.5 means try even by reading the whole file and translating it
+       if (howMuch < 0.5)
+               return B_OK;
+
+       // copy, we don't want to touch the passed one
+       BMessage ioExt(*ioExtension);
+       // to be filled in
+       BMessage metaData;
+       // just make sure to go back to where we were
+       off_t offset = inSource->Position();
+       BMallocIO mallocIO;
+
+       switch (fTranGroup) {
+               case B_TRANSLATOR_BITMAP:
+               {
+                       BBitmapStream bitmapStream;
+                       BBitmap *bitmap = NULL;
+                       status = Translate(inSource, inInfo, &ioExt, 
B_TRANSLATOR_BITMAP,
+                               &bitmapStream);
+                       inSource->Seek(offset, SEEK_SET);
+                       if (status < B_OK)
+                               return B_OK;
+                       bitmapStream.DetachBitmap(&bitmap);
+                       if (!bitmap)
+                               return B_OK;
+                       if (bitmap->IsValid()) {
+                               BRect bounds = bitmap->Bounds();
+                               // Refraction docs use this
+                               metaData.AddInt32("GRAFX:Width", 
bounds.IntegerWidth() + 1);
+                               metaData.AddInt32("GRAFX:Height", 
bounds.IntegerHeight() + 1);
+                               // some use this...
+                               BString s;
+                               s << (bounds.IntegerWidth() + 1);
+                               s << "x";
+                               s << (bounds.IntegerHeight() + 1);
+                               metaData.AddString("Dimensions", s.String());
+                               // base mime type
+                               outMetaData->AddMessage("image", &metaData);
+                               // exact mime type
+                               outMetaData->AddMessage(inInfo->MIME, 
&metaData);
+                       }
+                       delete bitmap; 
+                       return B_OK;
+               }       
+               case B_TRANSLATOR_TEXT:
+                       status = Translate(inSource, inInfo, &ioExt, 
B_TRANSLATOR_TEXT,
+                               &mallocIO);
+                       if (status >= B_OK) {
+                               // make sure it's \0 terminated
+                               mallocIO.Write("", 1);
+                               // split words and sort them by occurences
+                               // 
+                       }
+                       inSource->Seek(offset, SEEK_SET);
+                       return B_OK;
+                       
+               default:
+                       break;
+       }
+       return B_NO_TRANSLATOR;
+}
+
+
+
 void
 translate_direct_copy(BPositionIO *inSource, BPositionIO *outDestination)
 {
Index: src/add-ons/translators/shared/BaseTranslator.h
===================================================================
--- src/add-ons/translators/shared/BaseTranslator.h     (revision 31035)
+++ src/add-ons/translators/shared/BaseTranslator.h     (working copy)
@@ -50,6 +50,9 @@
 #define max(a,b) ((a > b) ? (a) : (b))
 #endif
 
+extern char B_TRANSLATOR_EXT_METADATA_ONLY[];
+extern char B_TRANSLATOR_EXT_METADATA[];
+
 class BaseTranslator : public BTranslator {
 public:
        BaseTranslator(const char *name, const char *info,
@@ -111,8 +114,12 @@
                uint32 outType, BPositionIO *outDestination, int32 baseType);
                
        virtual BView *NewConfigView(TranslatorSettings *settings);
-       
 
+       virtual status_t IndexMetaData(BPositionIO *inSource,
+               const translator_info *inInfo, BMessage *ioExtension,
+               uint32 outType, BMessage *outMetaData, float howMuch=1.0);
+
+
 protected:
        status_t BitsCheck(BPositionIO *inSource, BMessage *ioExtension,
                uint32 &outType);

Other related posts: