[recoll-user] indexing and searching customized metadata fields in pdf

  • From: Ciaran Farrell <ciaran@xxxxxxxxxx>
  • To: recoll-user@xxxxxxxxxxxxx
  • Date: Thu, 20 Mar 2014 21:20:06 +0100


I have a directory with lots of PDFs (not text, they were scanned). To
manage these PDFs I'd like to add custom metadata fields. With the pdfrw
module in python it was quite easy to do this. exiftool shows that the
fields were indeed added to the pdf. With exiftool -CustomFieldName1
-CustomFieldName2 I can extract the metadata.

I'd like to have recoll do the heavy lifting for me. However, I see that
by default it isn't possible to index/search on custom fields. I read
through https://bitbucket.org/medoc/recoll/wiki/HandleCustomField and
followed the instructions there (using exiftool to extract the metadata
instead of pdfinfo - which can't do it on the commandline for me).
However, whereas I see CustomFieldName appearing in the GUI (e.g. in the
advanced search window), no results are returned, irrespective of what I
do. For example, (on the commandline) recoll -t effDate:2012-12-01
should certainly have returned something (I can return results if I do
something like recoll -t fileType:pdf).

Is there any simpler way of doing it than having a customized rclpdf in
e.g. ~/.recoll and editing mimeconf to exec that? If not, what could I
be doing wrong (or not doing) that would stop the indexing/searching on
the custom field?


