[dokuwiki] Re: Search Result

  • From: Christopher Smith <chris@xxxxxxxxxxxxx>
  • To: dokuwiki@xxxxxxxxxxxxx
  • Date: Tue, 26 Feb 2008 20:49:18 +0000


On 26 Feb 2008, at 19:30, Jacob Steenhagen wrote:

On Tue, Feb 26, 2008 at 1:11 PM, Gerry Weißbach <gerry.w@xxxxxxxxxxxxxxxxxx > wrote: Hum ... well it's a suggestion ... but its really the last thing before disabling ;) I'd prefer something that includes the DW search engine ... if theres nothing, I'll invent it ;)

I'm really new to DokuWiki, so I can't say it doesn't exist, though I'd imagine if it did exist it'd be fairly easy to find... perhaps even the default. It seems what you'd really want is not necessarily rendered text, but rather text w/out the entities. Rendered text would make the bold text bold and italic text italic (not a big deal) but also make heading text into headings, lists into lists, etc. If you look at the synopsis on Google search results (http://www.google.com/search?q=dokuwiki ) it's basically the text that's on the web page minus any HTML applied styling.

Inventing it may not be terribly difficult. It may be as easy as running the text through a few regular expressions:
s/\{\{.*\}\}//
s/\*\*(.*)\*\*/$1/
s/\[\[.*\|(.*)\]\]/$1/
s/\[\[(.*)\]\]/$1/
s/(={2,6})(.*)\1/$2/
etc

There's a log of backslashes here because most of the entities in DokuWiki also have special meanings in regular expressions. It may end up being more complicated than that or maybe there's even a sanitize function already in the DokuWiki source that gives the "plain text rendering" of the page.

--
http://jacob.steenhagen.us


Without wishing to put anyone off who is motivated to extend DW in this direction, here are some thoughts...

The issue is that only a snippet of raw wiki text is displayed, a few characters on either side of the highlighted search term to give that term context. The snippet itself is not guaranteed to be well-formed wiki text, making it futile to attempt to render it in the same way that a page is normally rendered. Grabbing a snippet in this way, while perhaps not pretty, is fast.

A couple of potential alternatives:

- To search on rendered content rather than (or in addition to) raw wiki text would require a new/different search mechanism within DokuWiki. While that maybe desirable, its probably not trivial. A simpler alternative maybe to offer two search mechanisms, google (or other SE) search using "<search terms> :mysite.com" syntax and wiki search using current DW mechanism.

- To grab the entire rendered output for each page in the search results and then to take a snippet of that output surrounding the search term (if it still exists), is likely to unfeasible in terms or page response time and also likely to be a non-trivial task.


-Chris--
DokuWiki mailing list - more info at
http://wiki.splitbrain.org/wiki:mailinglist

Other related posts: