[recoll-user] help writing custom filter

  • From: Mike Roark <msroark@xxxxxxxxx>
  • To: recoll-user@xxxxxxxxxxxxx
  • Date: Sun, 07 Aug 2011 13:30:46 -0500

Hi,
I'm having trouble registering my custom filter with recoll. I wrote a filter for rar files using rclzip as a template (incredibly easy using python rarfile module, literally just a substitution from 'zip' to 'rar'). It seems to work ok at command prompt...

    I'm using the debian recoll 1.15.9-1 package from testing.

    I added the following line in ~/.recoll/mimemap:

.rar = application/x-rar

    I added the following line in ~/.recoll/mimeconf

application/x-rar = execm rclrar

I put the rclrar script in /usr/share/recoll/filters, with same perms as rclzip:

[mike@timetraveler ~]$ ls -l /usr/share/recoll/filters/rcl{zip,rar}
-rwxr-xr-x 1 root root 3503 Aug  7 12:25 /usr/share/recoll/filters/rclrar
-rwxr-xr-x 1 root root 3503 Jun 15 00:17 /usr/share/recoll/filters/rclzip

I set my loglevel = 6 in recoll.conf, and I have a small testdir with a rar file and a zip file in it which I'm indexing. I run recollindex -z > recoll.log 2>&1 to reindex...

I expect to see some mention of rclrar in the log file. For comparison I see stuff like this for rclzip and for my zip file:

:4:../rcldb/rcldb.cpp:1215:Db::needUpdate:yes (new): [Q/home/mike/tmp/testindex/puppet.zip|] :5:../index/fsindexer.cpp:360:processone: processing: [5 MB ] /home/mike/tmp/testindex/puppet.zip :4:../internfile/internfile.cpp:224:FileInterner:: [/home/mike/tmp/testindex/puppet.zip] mime [(null)] preview 0 :4:../internfile/internfile.cpp:298:FileInterner:: init ok application/zip [/home/mike/tmp/testindex/puppet.zip]
:4:../internfile/internfile.cpp:767:FileInterner::internfile. ipath []
:4:../internfile/mh_execm.cpp:142:MimeHandlerExecMultiple::next_document(): [/home/mike/tmp/testindex/puppet.zip]
:4:../internfile/mh_execm.cpp:42:MimeHandlerExecMultiple::startCmd
:4:../utils/execmd.cpp:185:ExecCmd::startExec: (1|1) /usr/share/recoll/filters/rclzip :4:../internfile/mh_execm.cpp:214:MHExecMultiple: got ipath [Apress.Pro.Puppet.May.2011.pdf]
...

However, I see no mention of rclrar and no ipaths getting found. The rar file seems to only be indexed by filename (which I would expect with no filter), since I cannot search on any of the content of the pdf file inside of it... (log output for the rar file below)...

Please let me know if you see anything wrong with my approach... I feel I'm missing something obvious.
            Thanks!
                -Mike


:4:../rcldb/rcldb.cpp:1215:Db::needUpdate:yes (new): [Q/home/mike/tmp/testindex/Oreilly.Building.and.Testing.with.Gradle.Jul.2011.rar|] :5:../index/fsindexer.cpp:360:processone: processing: [9 MB ] /home/mike/tmp/testindex/Oreilly.Building.and.Testing.with.Gradle.Jul.2011.rar :4:../internfile/internfile.cpp:224:FileInterner:: [/home/mike/tmp/testindex/Oreilly.Building.and.Testing.with.Gradle.Jul.2011.rar] mime [(null)] pr
eview 0
:3:../internfile/internfile.cpp:277:FileInterner:: ignored: [/home/mike/tmp/testindex/Oreilly.Building.and.Testing.with.Gradle.Jul.2011.rar] mime [a
pplication/x-rar]
:4:../internfile/internfile.cpp:298:FileInterner:: init ok application/x-rar [/home/mike/tmp/testindex/Oreilly.Building.and.Testing.with.Gradle.Jul.2011.rar]
:4:../internfile/internfile.cpp:767:FileInterner::internfile. ipath []
:4:../internfile/internfile.cpp:683:FileInterner::addHandler: next_doc is text/plain :4:../rcldb/rcldb.cpp:893:Db::add: udi [/home/mike/tmp/testindex/Oreilly.Building.and.Testing.with.Gradle.Jul.2011.rar|] parent []
:5:../rcldb/rcldb.cpp:1142:Rcl::Db::add: new doc record:
url=file:///home/mike/tmp/testindex/Oreilly.Building.and.Testing.with.Gradle.Jul.2011.rar
mtype=application/x-rar
fmtime=01312732351
origcharset=
fbytes=9759832
sig=97598321312732351
dbytes=0
caption=Oreilly.Building.and.Testing.with.Gradle.Jul.2011.rar
filename=Oreilly.Building.and.Testing.with.Gradle.Jul.2011.rar

:4:../rcldb/rcldb.cpp:1156:Db::add: docid 2 added [/home/mike/tmp/testindex/Oreilly.Building.and.Testing.with.Gradle.Jul.2011.rar|]




Other related posts: