After playing with recoll and doing some research on google. I found http://stackoverflow.com/questions/4643438/how-to-search-contents-of-multiple-pdf-files A simple script is all I need which is base on the link above. ####################################################################### # require pdftotext search_dir=$HOME/tmp/pdf cache_dir=$HOME/tmp/cache/pdf2text mkdir -pv "$cache_dir" find "$search_dir" -type f -name \*.pdf | while read file; do md5sum=$(md5sum "$file" | cut -d\ -f1) file_sed=$(echo "$(basename "$file")" | sed -e s"/[^a-zA-Z0-9]/-/"g) cache_file="$cache_dir/$file_sed-$md5sum" # run pdftotext only if cache file is not exist already ls "$cache_dir/*-$md5sum" > /dev/null 2>&1 || pdftotext "$file" "$cache_file" grep --color=always "$1" "$cache_file" done ######################################################################## marked it as solved then, thanks! - kris On Sat, Sep 17, 2011 at 00:21, Krisoijn Chan <ksc@xxxxxx> wrote: > I am trying to index ONLY *.pdf files. > > thanks for your help. > > - kris > > On Sat, Sep 17, 2011 at 00:03, <jfd@xxxxxxxxxx> wrote: > >> Krisoijn Chan writes: >> > ---------- Forwarded message ---------- >> > From: Krisoijn Chan <ksc@xxxxxx> >> > Date: Fri, Sep 16, 2011 at 22:00 >> > Subject: Exclude all file but pdf? >> > To: ecartis@xxxxxxxxxxxxx >> > >> > >> > After looking at recoll's doc, I dont think there is a way for it. >> > >> > Someone point me to the right direction, please... >> > >> > - Kris >> >> I'm not sure that I understand the issue fully here. I'll assume that you >> want to select all pdf files, with no search terms ? >> >> I think that ext:pdf should both work with the query language. >> >> A pure "mime:application/pdf" does not work currently, without other query >> terms, but you could use a very common term, like >> "the mime:application/pdf" or "a mime:application/pdf" (to be adjusted >> according to the document's language). >> >> Cheers, >> >> jf >> >> >