[recoll-user] Re: Fwd: Exclude all file but pdf?

  • From: Krisoijn Chan <ksc@xxxxxx>
  • To: recoll-user@xxxxxxxxxxxxx
  • Date: Sat, 17 Sep 2011 00:29:43 +1000

After playing with recoll and doing some research on google.

I found
http://stackoverflow.com/questions/4643438/how-to-search-contents-of-multiple-pdf-files

A simple script is all I need which is base on the link above.

#######################################################################
# require pdftotext

search_dir=$HOME/tmp/pdf
cache_dir=$HOME/tmp/cache/pdf2text

mkdir -pv "$cache_dir"

find "$search_dir" -type f -name \*.pdf | while read file; do
  md5sum=$(md5sum "$file" | cut -d\  -f1)
  file_sed=$(echo "$(basename "$file")" | sed -e s"/[^a-zA-Z0-9]/-/"g)
  cache_file="$cache_dir/$file_sed-$md5sum"

  # run pdftotext only if cache file is not exist already
  ls "$cache_dir/*-$md5sum" > /dev/null 2>&1 || pdftotext "$file"
"$cache_file"
  grep --color=always "$1" "$cache_file"
done
########################################################################

marked it as solved then, thanks!

- kris

On Sat, Sep 17, 2011 at 00:21, Krisoijn Chan <ksc@xxxxxx> wrote:

> I am trying to index ONLY *.pdf files.
>
> thanks for your help.
>
> - kris
>
> On Sat, Sep 17, 2011 at 00:03, <jfd@xxxxxxxxxx> wrote:
>
>> Krisoijn Chan writes:
>>  > ---------- Forwarded message ----------
>>  > From: Krisoijn Chan <ksc@xxxxxx>
>>  > Date: Fri, Sep 16, 2011 at 22:00
>>  > Subject: Exclude all file but pdf?
>>  > To: ecartis@xxxxxxxxxxxxx
>>  >
>>  >
>>  > After looking at recoll's doc, I dont think there is a way for it.
>>  >
>>  > Someone point me to the right direction, please...
>>  >
>>  > - Kris
>>
>> I'm not sure that I understand the issue fully here. I'll assume that you
>> want to select all pdf files, with no search terms ?
>>
>> I think that ext:pdf should both work with the query language.
>>
>> A pure "mime:application/pdf" does not work currently, without other query
>> terms, but you could use a very common term, like
>> "the mime:application/pdf"  or "a mime:application/pdf" (to be adjusted
>> according to the document's language).
>>
>> Cheers,
>>
>> jf
>>
>>
>

Other related posts: