[recoll-user] Re: full text searching for parts of a known date

  • From: <jfd@xxxxxxxxxx>
  • To: recoll-user@xxxxxxxxxxxxx
  • Date: Sat, 28 Jul 2012 08:30:12 +0200

Alexander writes:
 > Hi JF,
 > 
 > thanks again for the script! I tried it, it works and
 > this is what it adds to the head tag:
 > 
 > <head>
 > <title></title>
 > <meta name="Producer" content="ABBYY FineReader 8.0 Professional Edition"/>
 > <meta name="CreationDate" content=""/>
 > <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
 > <meta name="date" content="2012-02-18T00:00:00">
 > </head>
 > 
 > It seems to be picked up by recoll correctly, since I was 
 > subsequently able to find the docs using a keyword in combination 
 > with the date filter of the advanced search. :-)

Good !

I am attaching the updated script because the initial one had a few
problems. 

While this script is specific to the date/document format at hand,
modifying it to process some other date format, or field, would be quite
easy.

 > If you find the time to answer I have two further questions 
 > regarding the customisation of recoll.
 > 
 > 1. Will recollindex pick up more than one date tag?
 > (I was thinking of adding all contained dates to the head)

Recoll will only handle one field as the document date, meaning only the
"date" field can be queried using the specific date/time interval syntax.

However, it might be useful to add other dates from the document, giving
them other field names (ie: date1, date2, etc.). They will be processed as
ordinary text fields, but if they are in YMD format (as opposed to DMY),
you should be able to do something with wildcards (just specifying '1' or
'2' as the first character will dramatically reduce the processing time).

You'll need to add your fields to the "fields" configuration file to get
them to be indexed. See:

http://www.lesbonscomptes.com/recoll/usermanual/rcl.program.html#rcl.program.filters.html
 
http://www.lesbonscomptes.com/recoll/usermanual/rcl.program.fields.html
http://www.lesbonscomptes.com/recoll/usermanual/rcl.install.config.html#rcl.install.config.fields
+ comments in the fields file.

I was thinking that I needed to write a tutorial, but it already exists :) :
 https://bitbucket.org/medoc/recoll/wiki/HandleCustomField

 > 2. Does recoll take the original CreationDate from pdftotext into
 >   account? (I was thinking of putting the real file creation date
 >   reported by stat there)

No, just one date field for now. You could add the file mtime as one of the
custom fields above, but, as said, it will be processed as ordinary text,
not like a date.

Actually, it would not be impossible to modify recoll to handle multiple
date fields (but not trivial either). If other people think that this would
be an interesting feature, please speak up.

Cheers,

jf


Other related posts: