Another project to look at might be PLucene, which is a Perl port of Lucene (without any Java dependencies I believe) http://search.cpan.org/dist/Plucene/ http://www.perl.com/pub/a/2004/02/19/plucene.html Also recommend reading this by Tim Bray: http://www.tbray.org/ongoing/When/200x/2003/07/30/OnSearchTOC Regarding the actual building of search indexes, personally think it has to be done in the form of a "batch" (e.g. using a cron job every X minutes). For those without access to cron, there's pseudocron: http://www.bitfolge.de/pseudocron-en.html - basically you use something like an image in the page to run a PHP script "in the background" - users are not aware that the server is doing some serious number crunching - they experience no delay. The other approach is something like having updates to a given page trigger an update to the search indices. This probably results in more effecient execution - it's not one process scanning massive amounts of data but rather an incremental processing of a small set of data (from a single page). The problem is it can be very had to implement without having a chance of race conditions, where two sets of updates from different pages are competing with each other to update the indices. That said, it may be, for a wiki where there's only a few updates going on, this isn't a real problem. It may also be avoidable depending on the actual design of the search indices and the data they contain, in particular if there are relationships to maintain if an update to page X means that related updates have to be made for pages Y and Z, it gets hard. A middle ground might be when a page gets updated, it places some kind of "update message", containing instructions for how up to update the indices, in a "queue" (which might simply be a directory ordered by filemtime). An "offline" (or "out-of-band" like pseudocron) job processes these changes and is responsible for updating the indices and is the only process allowed to modify the indices, avoiding most of the trouble with file locking. That could work out pretty efficient although will need careful design as it's potentially easy to break a system like this and hard to debug when it is broken. One other implementation point there - if updates to the page are going to be used to trigger something, would strongly recommend aiming for a code design that's easily "pluginable" early - could be a demand for building other types of indexes when a page gets updated (e.g. a list of other pages it links to) A side note - was reading this article http://www.zend.com/pecl/tutorials/sdo.php - this isn't really ready for use expect for those will to install "bleeding edge" PHP versions but it sounds like it would handle management of search indexes pretty well, helping avoid race conditions but may be I mistunderstood. -- DokuWiki mailing list - more info at http://wiki.splitbrain.org/wiki:mailinglist