On Tue, 2005-08-09 at 22:31 +0200, Andreas Gohr wrote: > On Mon, 8 Aug 2005 16:11:37 +0200 > Harry Fuecks <hfuecks@xxxxxxxxx> wrote: > > > Another project to look at might be PLucene, which is a Perl port of > > Lucene (without any Java dependencies I believe) > > Lucene and it's brother Plucene were the first things that came to my > mind when thinking about an search index. Bu requiring Java or Perl > additionally isn't the way to go I think. So a port of Plucene to PHP > would be great, but is I think a whole project on it's own. I hadn't a > look at the sources yet so I don't know if could rip some parts from it > for PHP. i agree, i use dokuwiki mainly because its pure php, i do not like to have mixed php/perl/java/other stuff. :) > > > A middle ground might be when a page gets updated, it places some kind > > of "update message", containing instructions for how up to update the > > indices, in a "queue" (which might simply be a directory ordered by > > filemtime). An "offline" (or "out-of-band" like pseudocron) job > > processes these changes and is responsible for updating the indices > > and is the only process allowed to modify the indices, avoiding most > > of the trouble with file locking. > > This is what I would prefer. The biggest problem I see currently with > this aproach is to design an efficient index which is updatable on a per > document basis. (Suggestions welcome) > > > That could work out pretty efficient > > although will need careful design as it's potentially easy to break a > > system like this and hard to debug when it is broken. > > Hmm I don't think it would be more breakable than a "complete index" > solution... but I may miss some things. > > > One other implementation point there - if updates to the page are > > going to be used to trigger something, would strongly recommend aiming > > for a code design that's easily "pluginable" early - could be a demand > > for building other types of indexes when a page gets updated (e.g. a > > list of other pages it links to) > > I guess an index as I have in mind would be most useful for full text > search. Other indexes could maybe build by a simpler method (like your > script for orphans) > > Andi > > -- > http://www.splitbrain.org -- Redeeman <redeeman@xxxxxxxxxxx> -- DokuWiki mailing list - more info at http://wiki.splitbrain.org/wiki:mailinglist