[dokuwiki] Re: Search Index

  • From: Andreas Gohr <andi@xxxxxxxxxxxxxx>
  • To: dokuwiki@xxxxxxxxxxxxx
  • Date: Tue, 9 Aug 2005 22:31:51 +0200

On Mon, 8 Aug 2005 16:11:37 +0200
Harry Fuecks <hfuecks@xxxxxxxxx> wrote:

> Another project to look at might be PLucene, which is a Perl port of
> Lucene (without any Java dependencies I believe)

Lucene and it's brother Plucene were the first things that came to my
mind when thinking about an search index. Bu requiring Java or Perl
additionally isn't the way to go I think. So a port of Plucene to PHP
would be great, but is I think a whole project on it's own. I hadn't a
look at the sources yet so I don't know if could rip some parts from it
for PHP.

> A middle ground might be when a page gets updated, it places some kind
> of "update message", containing instructions for how up to update the
> indices, in a "queue" (which might simply be a directory ordered by
> filemtime). An "offline" (or "out-of-band" like pseudocron) job
> processes these changes and is responsible for updating the indices
> and is the only process allowed to modify the indices, avoiding most
> of the trouble with file locking.

This is what I would prefer. The biggest problem I see currently with
this aproach is to design an efficient index which is updatable on a per
document basis. (Suggestions welcome)

> That could work out pretty efficient
> although will need careful design as it's potentially easy to break a
> system like this and hard to debug when it is broken.

Hmm I don't think it would be more breakable than a "complete index"
solution... but I may miss some things.

> One other implementation point there - if updates to the page are
> going to be used to trigger something, would strongly recommend aiming
> for a code design that's easily "pluginable" early - could be a demand
> for building other types of indexes when a page gets updated (e.g. a
> list of other pages it links to)

I guess an index as I have in mind would be most useful for full text
search. Other indexes could maybe build by a simpler method (like your
script for orphans)

Andi

-- 
http://www.splitbrain.org
-- 
DokuWiki mailing list - more info at
http://wiki.splitbrain.org/wiki:mailinglist

Other related posts: