On 03/14/2011 08:54 PM, TNHarris wrote:
I measured utf8_stripspecials with English, German, and Japanese strings. (See attachment.) String lengths were: English: 14, 83, 167, 507, 69969 German: 16, 83, 170, 529, 69877 Japanese: 15, 88, 168, 533, 72266 old en: 0.17372, 0.82951, 1.71201, 5.29036, 530.94489 old de: 0.19019, 0.83116, 1.69232, 5.28697, 528.94762 old ja: 0.07924, 0.31162, 0.67059, 3.54457, 305.57479 new en: 0.33082, 0.46088, 0.61384, 0.82114, 89.03135 new de: 0.33938, 0.44337, 0.54182, 0.79181, 85.15497 new ja: 0.33939, 0.40425, 0.50494, 0.68881, 49.70783 It may be worth putting in a length test to use the faster function when the string is short. I should also try it using strtr.
I couldn't think of a better way to improve this. The crossing point seems to be between 24 and 32 characters.
My branch has this, and the other changes that I found most effective. The functions changed are tokenizer, cleanText, obfuscate, Doku_Lexer_Escape, and cleanID.
6 files changed, 145 insertions(+), 47 deletions(-)The high number of insertions comes from duplicating the UTF8_SPECIAL_CHARS string as an array.
-- - tom telliamed@xxxxxxxxxxxxx -- DokuWiki mailing list - more info at http://www.dokuwiki.org/mailinglist