[recoll-user] Re: Recoll 1.12.0 hangs
- From: Jean-Francois Dockes <jean-francois.dockes@xxxxxxxxxx>
- To: recoll-user@xxxxxxxxxxxxx
- Date: Mon, 23 Feb 2009 08:06:29 +0100
aehtytd02@xxxxxxxxxxxxxx writes:
>
> It's my first time using recoll, so sorry if this is a known issue. I'm
> using recoll 1.12.0 from the SuSE rpm from the website on SuSE 11.1. I
> tried having recoll index a small directory mostly of PDFs as a test, and
> it keeps hanging on various PDFs, with 100% CPU use split between recoll
> and pstotext (1.9, from the SuSE build service). I thought xpdf was used
> for indexing PDFs, or? Maybe one in ten PDFs has this problem: I gave up
> after three of them. The first such file can be found here:
> http://www.kyb.mpg.de/publications/attachments/Luxburg06_TR_%5B0%5D.pdf
>
> Probably this is a pstotext bug more than a recoll bug, but how can I get
> around it (apart from moving every tenth file out of the directory to be
> indexed)?
This is not a known issue, thanks a lot for taking the time to investigate
it.
pstotext is normally not used at all for indexing pdf files. Also I tried
indexing the pdf you linked to on a Suse machine and it went through
without a problem. This makes me think that the problem may be a little
different than just recoll/pstotext choking on some pdfs.
How do you determine which file is being processed when the indexing
hangs ?
The status line messages in the GUI are just indicative sample points and
can't be trusted for this.
Are there by any chance any postscript files stored with the pdfs ?
In order to better determine what command is actually performed, you could
use something like the following in a terminal window while the indexing is
hanging:
ps awwx| egrep 'recoll|pdftotext|pstotext|awk' | grep -v grep
Also I'd recommand using recollindex, not recoll, to do the indexing. This
is normally the same but having a separate process rather than a thread
inside the recoll GUI further simplifies things. Use "recollindex -z" in a
terminal window.
If anything is unclear, don't hesitate to ask questions (possibly off-list,
if not directly recoll-related, I have no way to know at this point how
familiar you are with the command line etc.)
Thanks again for reporting the issue.
Cheers,
JF
Other related posts: