[dokuwiki] Re: utf8 functions could be faster

  • From: TNHarris <telliamed@xxxxxxxxxxx>
  • To: dokuwiki@xxxxxxxxxxxxx
  • Date: Tue, 05 Apr 2011 00:51:58 -0400

On 03/14/2011 08:54 PM, TNHarris wrote:

I measured utf8_stripspecials with English, German, and Japanese
strings. (See attachment.) String lengths were:
English: 14, 83, 167, 507, 69969
German: 16, 83, 170, 529, 69877
Japanese: 15, 88, 168, 533, 72266

old en: 0.17372, 0.82951, 1.71201, 5.29036, 530.94489
old de: 0.19019, 0.83116, 1.69232, 5.28697, 528.94762
old ja: 0.07924, 0.31162, 0.67059, 3.54457, 305.57479
new en: 0.33082, 0.46088, 0.61384, 0.82114, 89.03135
new de: 0.33938, 0.44337, 0.54182, 0.79181, 85.15497
new ja: 0.33939, 0.40425, 0.50494, 0.68881, 49.70783

It may be worth putting in a length test to use the faster function when
the string is short. I should also try it using strtr.


I couldn't think of a better way to improve this. The crossing point seems to be between 24 and 32 characters.

My branch has this, and the other changes that I found most effective. The functions changed are tokenizer, cleanText, obfuscate, Doku_Lexer_Escape, and cleanID.
 6 files changed, 145 insertions(+), 47 deletions(-)
The high number of insertions comes from duplicating the UTF8_SPECIAL_CHARS string as an array.

--
- tom
telliamed@xxxxxxxxxxxxx
--
DokuWiki mailing list - more info at
http://www.dokuwiki.org/mailinglist

Other related posts: