[dokuwiki] Re: Search Index

  • From: "Daniel Mitchell" <DanielMitchell@xxxxxxxxxxxxx>
  • To: <dokuwiki@xxxxxxxxxxxxx>
  • Date: Thu, 11 Aug 2005 10:06:23 -0600

> Dan, what did you use to get these stats? I would appreciate 
> it if its a little script that I could use it as well.

 I forget exactly, but it was something along the lines of

cat `find . -name "*.txt"` > all.txt
cat all.txt | tr [:space:] "\n" > all1.txt
cat all1.txt | tr [:lower:] [:upper:] > all2.txt
cat all2.txt | tr -c [:alnum:] " " > all3.txt
cat all3.txt | sort | uniq | wc

 Something like that. I'm not 100% sure about the tr lines, the syntax
there was pretty fiddly, but the idea is: 1.join all files together.
2.translate spaces to newlines so it's one word per line. 3.upper-case
it all. 4. remove punctuation. 5. sort it, trim out duplicates, count.

 -- dan
--
DokuWiki mailing list - more info at
http://wiki.splitbrain.org/wiki:mailinglist

Other related posts: