Krisoijn Chan writes: > After playing with recoll and doing some research on google. > > I found > http://stackoverflow.com/questions/4643438/how-to-search-contents-of-multiple-pdf-files > > A simple script is all I need which is base on the link above. Just for the record, there are several ways to index only pdf files with Recoll, one of which would be to use the "indexedmimetypes" configuration variable which exists just for this purpose. In ~/.recoll/recoll.conf: indexedmimetypes = application/pdf Another approach which would work but doesn't make much sense would be: find $topdir -name '*.pdf'" | recollindex -i Using recoll would probably be simpler and offer richer function than the "simple" script / grep combination. Cheers, jf > A simple script is all I need which is base on the link above. > > ####################################################################### > # require pdftotext > > search_dir=$HOME/tmp/pdf > cache_dir=$HOME/tmp/cache/pdf2text > > mkdir -pv "$cache_dir" > > find "$search_dir" -type f -name \*.pdf | while read file; do > md5sum=$(md5sum "$file" | cut -d\ -f1) > file_sed=$(echo "$(basename "$file")" | sed -e s"/[^a-zA-Z0-9]/-/"g) > cache_file="$cache_dir/$file_sed-$md5sum" > > # run pdftotext only if cache file is not exist already > ls "$cache_dir/*-$md5sum" > /dev/null 2>&1 || pdftotext "$file" > "$cache_file" > grep --color=always "$1" "$cache_file" > done > ######################################################################## > > marked it as solved then, thanks! > > - kris