On 03/11/2011 10:11 AM, Andreas Gohr wrote:
Do you have your scripts you used to evaluate the speeds available somewhere, so we can reproduce your tests? What input data did you use?
I profiled using xdebug. My test data is the Pro Git book. http://whoopdedo.org/doku/_media/progit-data.tar.gz Individual functions were evaluated simply define("UTF8_MBSTRING",0); require(DOKU_INC.'inc/utf8.php'); $s = file_get_contents("data/pages/de/getting-started.txt"); $t1 = xdebug_time_index(); for($i=0;$i<10000;$i++){ unset($b); //$b = strtr($s,'.','@'); $b = str_replace('.','@',$s); } $t2 = xdebug_time_index(); print($t2-$t1); print("\n");The indexer I measure via a cachegrind dump. With an empty data/index directory, I'd render each page's metadata, then run bin/indexer.php over the English and German pages with profiling enabled.
Indexing spends most of the time in tokenizer. With utf8_stripspecials using preg_replace, it took an average of 387ms per call. Using str_replace, that dropped to 26ms.
The summary of rebuilding the index, from a shutdown function that calls xdebug_time_index and xdebug_peak_memory_usage,
original indexer: Execution time: 92.4311s Max Memory Usage: 17166.97KB new indexer: Execution time: 92.8585s Max Memory Usage: 14840.30KB new indexer with the string improvements: Execution time: 90.3947s Max Memory Usage: 15119.81KB All the utf8 unit tests passed.This machine is an Intel U2700 1.3GHz single-core running Debian 6.0, PHP 5.3.3
-- - tom telliamed@xxxxxxxxxxxxx -- DokuWiki mailing list - more info at http://www.dokuwiki.org/mailinglist