[dokuwiki] Re: Search Result

On Tue, Feb 26, 2008 at 2:30 PM, Jacob Steenhagen <jacob@xxxxxxxxxxxxx>
wrote:

> It may end up being more complicated than that or maybe there's even a
> sanitize function already in the DokuWiki source that gives the "plain text
> rendering" of the page.
>

OK, so it took me a little while, but I did finally track down the
ft_snippet function in inc/fulltext.php that appears to give the small
snippet of text in the search results. It also doesn't not appear that
there's currently a plain text renderer (there's raw, which is kinda what we
get now). I'm unsure if the better route is a few regular expressions to try
to clean the entities out of the snippet or creating a plain text renderer,
rendering the page in plain text, and taking the snippet from that
rendering.

From the side of performance, the second option sounds like a lot of steps.
But (as usual, I'm not really certain) I think DokuWiki does do some caching
with rendered pages so in most circumstances, pulling the snippet from the
rendered plain text version wouldn't be much worse than pulling the snippet
from the raw wiki code like it is now. And regular expressions can be slow,
so putting a bunch of them into ft_snippet to filter out wiki entities might
not be any better for performance. This would also effect the length of the
snippet as it appears DokuWiki tries to give 100 characters (well, bytes,
which I guess could be either 50 or 100 chars... still new to UTF8, too :)
on either side of the keyword. If you take the snippet after it's formed and
start hacking away at it, it'll obviously end up shorter.

So, based on my incomplete understanding of the DokuWiki codebase, IMHO the
better option (even though it's more work) is to create a plain text
renderer and create snippets from that.


-- 
http://jacob.steenhagen.us

Other related posts: