On Fri, 2 Sep 2011 16:01:44 +0200,jfd@xxxxxxxxxx wrote: > Ramon Diaz-Uriarte writes: > > Dear All, > > > > I've been a very happy user of recoll. Lately, however, I've started to > > notice that I periodically get peaks in IO that freeze my systems. > > > > > > Details: > > > > - I index my home directory (my recoll.conf is at the bottom) using real > > time indexing. (I also monitor /usr/share/doc with another recollindex, > > but that rarely causes trouble). > > > > - My current recoll version is 1.15.9-1 (as packed by Debian). > > > > - I am monitoring IO usage with iotop. > > > > - From time to time (e.g., when I delete/refile messages in my Maildir, or > > when offlineimap deletes messages) I can see recollindex doing lots of > > IO (iotop shows up to 99%). > > > > - This seems to lead to many other processes being left waiting (as shown > > by top). For instance, while answering email, Emacs freezes and you > > cannot see what your are typing. Or the command line freezes if you are > > doing a simple "ls". > > > > - This can last from barely less than a second to a few seconds, but it is > > very intrusive. > There is currently nothing in recoll which would prevent it from using all > the IO bandwidth available on the system (it does use setpriority() to > lower its cpu priority). I'll add an ionice() call as soon as this will be > available in libc. > This has actually already been reported, see: > https://bitbucket.org/medoc/recoll/issue/46/recoll-needs-ionice-and-nice > As far as I know, nothing has changed about the ionice availability as a > system call, so the advice remains the same, use the ionice command to > start recollindex, for example: > ionice -c3 recollindex -m > or (for a higher precedence, and as suggested by the previous reporter): > ionice -c2 -n7 recollindex -m Ah, thanks. I should have looked at the Issues. Interestingly, I have some of the same issues as reported there, such as the system being extremely slow to open (or close) a simple terminal. Anyway, I tried with ionice -c2 -n7 recollindex -m and things only improved a little bit. I then tried with ionice -c3 recollindex -m and for the last two days things seem mostly OK (though I still get the occasional slow down). Interestingly, in at least two cases, while experimenting with deleting large directories, recollindex died (without any trace of why in the log) after being "ioniced" with -c3. > > - The log output shows all sorts of messages, many of them "No such file > > or directory", but not all. In particular, it does not seem related (as > > far as I can tell) to processing some very huge text file. > After looking at the screen copy, I'd guess that these "no such file or > directory" messages are related to dangling symlinks, and have no > relationship to the IO problem. To know more about what's happening while > recollindex is thrashing the system, you could set a logfilename, raise the > loglevel to 4 and have a look at what recollindex is actually doing. > > - This problem happens in several machines, from a tiny Asus netbook to a > > quad-core workstation. All run Debian GNU/Linux and recent stock Debian > > kernels (e.g., 2.6.39). > > > > I am attaching a screen capture that shows (left to right, top to bottom): > > * tail -f recolltrace > > * iotop -o > > * iotop -o -a > > * top > > > > it shows a case where IO from recollindex seems to do lots of IO and a > > large fraction of processes (23%, but in other cases as many as 50%) are > > waiting, I think because of IO. > > > > > > What am I doing wrong? Is there a way to avoid these problems? > Thanks for reporting this and for all the detail, I don't think that you > are doing anything wrong. > I'm not too good at interpreting iotop output, but it seems to report 2M/s > of write output, which seems awfully low to saturate the IO system, except > if these are small and synchronous writes. It would be interesting to have > the average size of writes in addition to the aggregate transfer rate. > I think that another reason why a process might be blocked is when waiting > for a page-in, which may happen if recollindex is eating a lot of memory. > It would be interesting to know how big recollindex has become at this > point. Another user has reported that recollindex becomes very big when > deleting many documents, but I have not seen this happen myself. Some other issues related to memory, etc: 1. In a large workstation with 64 GB of RAM, I was seeing the same problems, so memory is unlikely to be the problem. 2. In that workstation, with two disks, it seems I eliminated the problem when I configured recoll to write the DBs to the second hard drive. So the problem seems to be one of writing of the indices. 3. (In the laptop) top rarely reports more than 1.5% of the memory being used (total virtual size around 195000 kb) during "normal" operation. 4. I tried several operations of creating/deleteing large directories. When indexing the newly added files, recollindex can get up to a virtual size of around 1000000 kb, which is not too bad (not more than 5% memory) which seems associated to processing very large text files and flusing (log shows something like :4:../rcldb/rcldb.cpp:1156:Db::add: docid 684513 added [/home/ramon/test-recoll/JorgeMartin/NormaDATA Expresion 5.zip|NormaDATA Expresion 5/An�lisis_MDA231-shRNALacZ-Doxy vs MDA231-shRKH2eDZhMxLxwTFgtM1EmLA] :4:../rcldb/rcldb.cpp:1180:Db::add: text size >= 10 Mb, flushing or output in the log such as :4:rclmonrcv.cpp:568:RclIntf::getEvent: unhandled event IN_IGNORED 0x8000 /home/ramon/.gconf/apps/nm-applet/ignore-ca-cer 5. I also see what you mention has been reported before: When deleting many files, recollindex can really use a lot of memory. For instance, this is the output from top PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 29832 ramon 39 19 4071m 3.9g 2664 D 25 50.6 33:15.47 recollindex and this was in the log: log file :4:../rcldb/rcldb.cpp:1392:Db::purgeFile: delete subdoc 684513 Apparently, large memory consumption is related to deletion of directories that contain many large files rather than large directories with mostly small files. I am not sure, though; peaks in memory usage are often brief. > Also, depending on tuning. some systems will reclaim pages from > sleeping processes when doing IO, even when there would be enough available > memory to avoid it. This would signal itself by a "wakeup" time for > interactive processes which have been sleeping, after recollindex has been > active. I mention this for reference only, as you report having trouble > while actively working. > Unfortunately, I haven't looked at paging issues since I had to, when > sunos4 replaced sunos3 in the 90's :), and I declare myself incompetent > about the details of what to look at. > I'd be quite interested to learn if the ionice approach improves the issue > for you, I could then at least document it, until I can use a better > integrated approach. > And also the recollindex size. What other type of info would be useful to report here? Best, R. > Cheers, > jf -- Ramon Diaz-Uriarte Department of Biochemistry Universidad Autónoma de Madrid Spain http://ligarto.org/rdiaz Temporarily at: Structural Biology and Biocomputing Programme Spanish National Cancer Centre (CNIO) Phone: +34-91-732-8000 ext. 3019 Fax: +-34-91-224-6972