[dokuwiki] Re: utf8 functions could be faster

  • From: TNHarris <telliamed@xxxxxxxxxxx>
  • To: dokuwiki@xxxxxxxxxxxxx
  • Date: Fri, 11 Mar 2011 18:13:48 -0500

On 03/11/2011 10:11 AM, Andreas Gohr wrote:

Do you have your scripts you used to evaluate the speeds available
somewhere, so we can reproduce your tests? What input data did you
use?


I profiled using xdebug.

My test data is the Pro Git book.
http://whoopdedo.org/doku/_media/progit-data.tar.gz

Individual functions were evaluated simply

    define("UTF8_MBSTRING",0);
    require(DOKU_INC.'inc/utf8.php');
    $s = file_get_contents("data/pages/de/getting-started.txt");
    $t1 = xdebug_time_index();
    for($i=0;$i<10000;$i++){
        unset($b);
        //$b = strtr($s,'.','@');
        $b = str_replace('.','@',$s);
    }
    $t2 = xdebug_time_index();
    print($t2-$t1);
    print("\n");

The indexer I measure via a cachegrind dump. With an empty data/index directory, I'd render each page's metadata, then run bin/indexer.php over the English and German pages with profiling enabled.

Indexing spends most of the time in tokenizer. With utf8_stripspecials using preg_replace, it took an average of 387ms per call. Using str_replace, that dropped to 26ms.

The summary of rebuilding the index, from a shutdown function that calls xdebug_time_index and xdebug_peak_memory_usage,
original indexer:
Execution time: 92.4311s Max Memory Usage: 17166.97KB
new indexer:
Execution time: 92.8585s Max Memory Usage: 14840.30KB
new indexer with the string improvements:
Execution time: 90.3947s Max Memory Usage: 15119.81KB

All the utf8 unit tests passed.

This machine is an Intel U2700 1.3GHz single-core running Debian 6.0, PHP 5.3.3

--
- tom
telliamed@xxxxxxxxxxxxx
--
DokuWiki mailing list - more info at
http://www.dokuwiki.org/mailinglist

Other related posts: