Am Montag, 22. August 2011 schrieb jfd@xxxxxxxxxx: > Theo Wollenleben writes: > > I moved a directory tree containing most of my indexed files. Updating > > the index almost doubled its size. Now there are 28.6 GB of index files > > under the directory xapiandb. At the end of the index update, while the > > status bar shows "Indexing in progress: Purge", the recoll process > > starts consuming all of the available memory until swapping to disk > > begins (recoll apparently needs more than 3 GB for purging my index). I > > tried to let it finish but eventually killed the recoll process after a > > few hours. Is there a way to purge the index without excessive memory > > usage? > > It is normal that renaming the main directory would double the index size > as the renamed files will be indexed as new before the purge phase will > delete the old data. Recoll has no concept for renaming or moving > files. Is it also normal that the index is still twice as large, even after the purging has finished successfully? > But I've really got no idea of why the purge phase is using a lot of > memory. It is normally a simple loop to delete the documents that don't > exist any more, just a repeated Xapian "delete" call. I observed that the memory usage of the recoll process increased averagely by a few hundred kilobytes for every deleted document (for every "Db::purge: deleted document" message), which is about the size of the text per file to be indexed. > I'd like to have a better suggestion, but the only idea which comes to > mind is to just delete the xapiandb directory and reindex. I do realize > that regenerating a dozen GB of index is no fun, but I just have no other > idea about what to do. Since mostly the easiest way is not the funniest, I instead hacked the file rcldb/rcldb.cpp to let recoll delete only a certain number of documents from the index and ran the update procedure several times. While doing so I made another observation. While recoll walks the directory tree I now get messages "Indexing in progress: (Files [...]/46127) /[...]" on the status bar, so I suppose there are 46127 documents in the index. This number was greater before and decreased with every index update using the hacked rcldb.cpp. But once having reached the count #38999 the purging will always stop with the message :5:../rcldb/rcldb.cpp:1350:Db::purge: document #38999 not found So I'm stuck with that number of 46127 documents (even when using the original rcldb.cpp), though I have less then 30000 files to be indexed.