[recoll-user] Re: help writing custom filter

  • From: Mike Roark <msroark@xxxxxxxxx>
  • To: recoll-user@xxxxxxxxxxxxx
  • Date: Wed, 17 Aug 2011 18:05:50 -0500

On Wed, Aug 17, 2011 at 2:43 AM, <jfd@xxxxxxxxxx> wrote:

> Mike Roark writes:
>  > Hi,
>
> Hello, and sorry it took so long, I was away on holidays.
>
>  >      I'm having trouble registering my custom filter with recoll. I
>  > wrote a filter for rar files using rclzip as a template (incredibly easy
>  > using python rarfile module, literally just a substitution from 'zip' to
>  > 'rar'). It seems to work ok at command prompt...
>  >
>  >      I'm using the debian recoll 1.15.9-1 package from testing.
>  >
>  >      I added the following line in ~/.recoll/mimemap:
>  >
>  > .rar = application/x-rar
>  >
>  >      I added the following line in ~/.recoll/mimeconf
>  >
>  > application/x-rar = execm rclrar
>  >
>  >     I put the rclrar script in /usr/share/recoll/filters, with same
>  > perms as rclzip:
>  >
>  > [mike@timetraveler ~]$ ls -l /usr/share/recoll/filters/rcl{zip,rar}
>  > -rwxr-xr-x 1 root root 3503 Aug  7 12:25
> /usr/share/recoll/filters/rclrar
>  > -rwxr-xr-x 1 root root 3503 Jun 15 00:17
> /usr/share/recoll/filters/rclzip
>  >
>  >      I set my loglevel = 6 in recoll.conf, and I have a small testdir
>  > with a rar file and a zip file in it which I'm indexing. I run
>  > recollindex -z > recoll.log 2>&1 to reindex...
>  >
>  >      I expect to see some mention of rclrar in the log file. For
>  > comparison I see stuff like this for rclzip and for my zip file:
>  >
>  > :4:../rcldb/rcldb.cpp:1215:Db::needUpdate:yes (new):
>  > [Q/home/mike/tmp/testindex/puppet.zip|]
>  > :5:../index/fsindexer.cpp:360:processone: processing: [5 MB ]
>  > /home/mike/tmp/testindex/puppet.zip
>  > :4:../internfile/internfile.cpp:224:FileInterner::
>  > [/home/mike/tmp/testindex/puppet.zip] mime [(null)] preview 0
>  > :4:../internfile/internfile.cpp:298:FileInterner:: init ok
>  > application/zip [/home/mike/tmp/testindex/puppet.zip]
>  > :4:../internfile/internfile.cpp:767:FileInterner::internfile. ipath []
>  >
> :4:../internfile/mh_execm.cpp:142:MimeHandlerExecMultiple::next_document():
>  > [/home/mike/tmp/testindex/puppet.zip]
>  > :4:../internfile/mh_execm.cpp:42:MimeHandlerExecMultiple::startCmd
>  > :4:../utils/execmd.cpp:185:ExecCmd::startExec: (1|1)
>  > /usr/share/recoll/filters/rclzip
>  > :4:../internfile/mh_execm.cpp:214:MHExecMultiple: got ipath
>  > [Apress.Pro.Puppet.May.2011.pdf]
>  > ...
>  >
>  >       However, I see no mention of rclrar and no ipaths getting found.
>  > The rar file seems to only be indexed by filename (which I would expect
>  > with no filter), since I cannot search on any of the content of the pdf
>  > file inside of it... (log output for the rar file below)...
>  >
>  >      Please let me know if you see anything wrong with my approach... I
>  > feel I'm missing something obvious.
>
> What you do looks quite ok.
>
> Did you add an [index] section at the top of your mimeconf though ? Lines
> like "application/x-rar = execm rclrar" should be inside an "index" section
> (there are other sections in mimeconf, to define what icon to use, how to
> group doc types into broader genres etc.)
>
> Otherwise I can see really no reason why this wouldn't work, I'm quite
> confident that rar will soon be a supported type :)
>
> Cheers,
>
> jf
>
>
That was exactly the problem, and the script seems to work. Added it as an
attachment here:

https://bitbucket.org/medoc/recoll/issue/52/requests-for-additional-format-support

Other related posts: