[recoll-user] Re: Fwd: Exclude all file but pdf?
- From: Krisoijn Chan <ksc@xxxxxx>
- To: recoll-user@xxxxxxxxxxxxx
- Date: Sat, 17 Sep 2011 00:29:43 +1000
After playing with recoll and doing some research on google.
I found
http://stackoverflow.com/questions/4643438/how-to-search-contents-of-multiple-pdf-files
A simple script is all I need which is base on the link above.
#######################################################################
# require pdftotext
search_dir=$HOME/tmp/pdf
cache_dir=$HOME/tmp/cache/pdf2text
mkdir -pv "$cache_dir"
find "$search_dir" -type f -name \*.pdf | while read file; do
md5sum=$(md5sum "$file" | cut -d\ -f1)
file_sed=$(echo "$(basename "$file")" | sed -e s"/[^a-zA-Z0-9]/-/"g)
cache_file="$cache_dir/$file_sed-$md5sum"
# run pdftotext only if cache file is not exist already
ls "$cache_dir/*-$md5sum" > /dev/null 2>&1 || pdftotext "$file"
"$cache_file"
grep --color=always "$1" "$cache_file"
done
########################################################################
marked it as solved then, thanks!
- kris
On Sat, Sep 17, 2011 at 00:21, Krisoijn Chan <ksc@xxxxxx> wrote:
> I am trying to index ONLY *.pdf files.
>
> thanks for your help.
>
> - kris
>
> On Sat, Sep 17, 2011 at 00:03, <jfd@xxxxxxxxxx> wrote:
>
>> Krisoijn Chan writes:
>> > ---------- Forwarded message ----------
>> > From: Krisoijn Chan <ksc@xxxxxx>
>> > Date: Fri, Sep 16, 2011 at 22:00
>> > Subject: Exclude all file but pdf?
>> > To: ecartis@xxxxxxxxxxxxx
>> >
>> >
>> > After looking at recoll's doc, I dont think there is a way for it.
>> >
>> > Someone point me to the right direction, please...
>> >
>> > - Kris
>>
>> I'm not sure that I understand the issue fully here. I'll assume that you
>> want to select all pdf files, with no search terms ?
>>
>> I think that ext:pdf should both work with the query language.
>>
>> A pure "mime:application/pdf" does not work currently, without other query
>> terms, but you could use a very common term, like
>> "the mime:application/pdf" or "a mime:application/pdf" (to be adjusted
>> according to the document's language).
>>
>> Cheers,
>>
>> jf
>>
>>
>
Other related posts: