[dokuwiki] Performance issues

  • From: Yann <yann.hamon@xxxxxxxxx>
  • To: dokuwiki@xxxxxxxxxxxxx
  • Date: Sun, 12 Mar 2006 19:55:26 +0100

Hi,

  I in my last mail I told you we were encountering heavy performance issues
with dokuwiki. The bug I presented happened again today, resulting in heavy
load of the servers for a few hours:
http://isfates.mandragor.org/toto/dokuwikibug.png . Emptying the
changes.logresolves the problem. I went through the code, I noticed
some points that
might need to be improved; as they become a problem with large wikis

// common.php, line 943
function getRevisions($id){


There is the :

    while (($file = readdir($dh)) !== false) {
      if (is_dir($revd.'/'.$file)) continue;
      if (preg_match('/^'.$clid.'\.(\d+)\.txt(\.gz)?$/',$file,$match)){
        $revs[]=$match[1];
      }
    }

part, that collects all revisions of a given file.

/htdocs/data/attic$ ls -l | wc -l
2766

That part of the code is doing the preg_match 2800 time if I want the list
of revisions of a file that is in the / of my wiki. Maybe it would be
efficienter to group the changes of a given file, in  a subdirectory? That
would certainly be better :) There probably is a better way to get the
correct line than the regexp too, for example some kind of binary search?

Then there is in:

// common.php, line 785
function getRevisionInfo($id,$rev){

with:

  $loglines = file($conf['changelog']);
  $loglines = preg_grep("/$rev\t\d+\.\d+\.\d+\.\d+\t$id\t/",$loglines);

And getRevisionInfo is called in html.php, line 447:

  foreach($revisions as $rev){
    $date = date($conf['dformat'],$rev);
    $info = getRevisionInfo($ID,$rev);

That means, if I have a file which has been modified 50 times... And a
changelog of, mmh, 5 megabytes.That means I will open changes.log, grep it
(it may be pretty big! grepping 5mbytes is sloow :/) , and that.. 50 times.
So how could it be better, imo :
  - display by default only the last $conf['recent'] modifications, and
another page to see all the modifications...
  - Replace the getRevisions and the loop calling getRevisionInfo by a
single function, well, getLastRevisionsInfo (?), which would open
changes.log a single time, and instead of grepping n times the whole
changes.log, grep the lines concerning the page asked all at once.


What do you think? I can try implement these ideas if you think the idea is
ok.
Sorry for the bad english ;)


Yann Hamon

Other related posts: