[dokuwiki] Re: Scalability testing: content needed


On 21 Nov 2008, at 05:11, holmberg_jason@xxxxxxx wrote:

Hi list,

I'm evaluating DokuWiki for potential use with a 2000+ topics project.
I'm interested in learning how the Search would perform under those
conditions and would like to test it myself with my available hardware.
Can anyone recommend a good way to get 2000 topics of content without
too much scripting to convert existing HTML-based content? Is there a
difference between performance time of HTML content and native wiki
content?

I dream of a random wiki topic generator with a zip file download, but I
wake to find no such thing exists...  :)


What do you mean by a topic? a page?

Search indexing was revamped at the end of 2006. You should find some notes in this list's archive about the improvements made at that time. There may also be some messages on real world experiences of the changes after the release in March 2007. Whole word searching is quite quick, partial searching (e.g. doku* or *wiki) not so fast.

From a search perspective, you don't require "wiki syntax" just content. Any group of 2000 text files should do. DokuWiki indexes (and therefore searches) the raw wiki text rather than the rendered output.

A google search for site:www.dokuwiki.org suggests there are 3390 pages in the wiki, that's probably inaccurate as it will include each plugin tag, but it puts www.dokuwiki.org at around the scale your after. It shouldn't be too difficult to construct a spider to grab the wiki content for each page. Or maybe if you ask nicely Andi will send you an archive :)

- Chris




--
DokuWiki mailing list - more info at
http://wiki.splitbrain.org/wiki:mailinglist

Other related posts: