[dokuwiki] Some performance questions

  • From: "Balazs Attila-Mihaly \(Cd-MaN\)" <x_at_y_or_z@xxxxxxxxx>
  • To: Dokuwiki Devel List <dokuwiki@xxxxxxxxxxxxx>
  • Date: Sat, 17 May 2008 10:42:24 -0700 (PDT)

Hello all.

Dokuwiki has been great as an internal documentation platform, however recently 
I've been having some performance problems with it. Because this is related to 
both Dokuwiki in general and the blog plugin in particular, I decided to post 
it here, since I know that chi reads this mailing list, so please excuse me if 
the content of the mail isn't 100% dokuwiki-core related.

I loaded up XDebug and did a profiling of a request. I found that one 
problematic function was _unique_key from the blog plugin (this was taking up 
around 30% of the request time). As far as I can tell the purpose of this 
function is the following:
- to generate unique keys (to ensure that the returned keys are different and 
unique, even if the input parameters are the same)
- the sort order of the generated keys should be alphabetic (or numeric if the 
inputs are numbers) and if the inputs coincide, the key generated at the first 
call should be "less" that the key generated at the second call

The attached patch makes it so that the function runs in O(1) instead of O(n) 
and satisfies these criteria. Please consider it for inclusion if I understood 
correctly the requirements for it (on our dokuwiki installation we use sort by 
date and it seems to work).

Now for the more general issue: Dokuwiki touches a lot of file access during 
page rendering. I don't know if this issue is specific to pages using the blog 
(include) plugin, or a more general program, but after I applied the attached 
patch and re-run the xdebug profiler, the most "costly" functions (which had 
the largest number of execution * time length of execution values) were 
file_exists and file_get_contents. Is there any way to reduce this? There was a 
mail some time ago where somebody complained that dokuwiki generated too much 
harddisk access and suggested some solutions, however as far as I remember 
there were no followups.

In conclusion my questions would be:
- are there some "silver bullets" ( :-) ) to cut down on DokuWiki's hard disk 
access needs?
- is the include plugin (and the way it work) the source of all these accesses 
or does it simply magnify the issue because it accesses multiple pages?
- is a cache-friendly version of the include plugin planned? (I assume that 
this would need way to get notified when any of the pages from a given 
namespace changes so that the cache can be purged)
- i have a lot of files in the attic directory, however recently I've lost the 
changelog, meaning that the current changelog doesn't include those old 
versions. are they safe to delete (assuming that I don't need to revert to 
versions from there)? would deleting them improve the performance?

Best regards and thank you for your patiente.



      __________________________________________________________
Sent from Yahoo! Mail.
A Smarter Email http://uk.docs.yahoo.com/nowyoucan.html
73a74
>     $unique_keys_memoize = array();
111c112
<       $key = $this->_uniqueKey($key, $result);
---
>       $key = $this->_uniqueKey($key, $unique_keys_memoize);
221c222
<   function _uniqueKey($key, &$result){
---
>   function _uniqueKey($key, &$unique_keys_memoize){
224,237c225,230
<     if (is_numeric($key)){
<       while (array_key_exists($key, $result)) $key++;
<       return $key;
<       
<     // append a number to literal keys
<     } else {
<       $num     = 0;
<       $testkey = $key;
<       while (array_key_exists($testkey, $result)){
<         $testkey = $key.$num;
<         $num++;
<       }
<       return $testkey;
<     }
---
>     if (is_numeric($key))
>       $key = sprintf('%08x', $key);
>     if (!array_key_exists($key, $unique_keys_memoize))
>       $unique_keys_memoize[$key] = 0;
>     
>     return sprintf('%s_%s', $key, $unique_keys_memoize[$key]++);

Other related posts: