[dokuwiki] Re: Search Index

  • From: Redeeman <redeeman@xxxxxxxxxxx>
  • To: dokuwiki@xxxxxxxxxxxxx
  • Date: Tue, 09 Aug 2005 23:54:51 +0200

On Tue, 2005-08-09 at 22:31 +0200, Andreas Gohr wrote:
> On Mon, 8 Aug 2005 16:11:37 +0200
> Harry Fuecks <hfuecks@xxxxxxxxx> wrote:
> 
> > Another project to look at might be PLucene, which is a Perl port of
> > Lucene (without any Java dependencies I believe)
> 
> Lucene and it's brother Plucene were the first things that came to my
> mind when thinking about an search index. Bu requiring Java or Perl
> additionally isn't the way to go I think. So a port of Plucene to PHP
> would be great, but is I think a whole project on it's own. I hadn't a
> look at the sources yet so I don't know if could rip some parts from it
> for PHP.

i agree, i use dokuwiki mainly because its pure php, i do not like to
have mixed php/perl/java/other stuff. :)

> 
> > A middle ground might be when a page gets updated, it places some kind
> > of "update message", containing instructions for how up to update the
> > indices, in a "queue" (which might simply be a directory ordered by
> > filemtime). An "offline" (or "out-of-band" like pseudocron) job
> > processes these changes and is responsible for updating the indices
> > and is the only process allowed to modify the indices, avoiding most
> > of the trouble with file locking.
> 
> This is what I would prefer. The biggest problem I see currently with
> this aproach is to design an efficient index which is updatable on a per
> document basis. (Suggestions welcome)
> 
> > That could work out pretty efficient
> > although will need careful design as it's potentially easy to break a
> > system like this and hard to debug when it is broken.
> 
> Hmm I don't think it would be more breakable than a "complete index"
> solution... but I may miss some things.
> 
> > One other implementation point there - if updates to the page are
> > going to be used to trigger something, would strongly recommend aiming
> > for a code design that's easily "pluginable" early - could be a demand
> > for building other types of indexes when a page gets updated (e.g. a
> > list of other pages it links to)
> 
> I guess an index as I have in mind would be most useful for full text
> search. Other indexes could maybe build by a simpler method (like your
> script for orphans)
> 
> Andi
> 
> -- 
> http://www.splitbrain.org
-- 
Redeeman <redeeman@xxxxxxxxxxx>

-- 
DokuWiki mailing list - more info at
http://wiki.splitbrain.org/wiki:mailinglist

Other related posts: