[juneau-lug] recursively changing the case of hyperlinks in html documents

  • From: "James Zuelow" <jamesz@xxxxxxxxxxxxxxxx>
  • To: "Juneau-Lug" <juneau-lug@xxxxxxxxxxxxx>
  • Date: Tue, 6 Aug 2002 16:19:15 -0800

These go along with the previous script I posted.  Once you rename your
files & directories, your existing hyperlinks may not be valid anymore on a
*nix server.  (Index.html is not index.html)

So this paired shell script and sed script will rewrite <A (everything
here)> to all lowercase.  It will leave other HTML tags alone.  You could
expand the regexp to be <[Aa] [Hh][Rr][Ee][Ff] to get JUST the hyperlinks if
you like, or remove the [Aa] to get all tags and any text that happens to be
in <> brackets.

Note that in the current form, it does not test the directory input supplied
by the user (see previous script) and you have to manually replace your
existing *.html or *.htm with the new lowercase link *.html.2 or *.htm.2
files.  This script is mostly copied out of the O'Reilly Sed & Awk book - I
just wrote the regexps.

Cheers,

James

---changetags---

# James Zuelow 5/6 Aug 2002
# changetags.sed needs to be in the same directory
#!/bin/sh
echo -n "Enter top level directory (i.e. /var/www): "
read TL
# here is where you would test for a valid directory
find $TL -name *.htm > htfile.txt
find $TL -name *.html > htfile.txt
# note that this does not account for shtml, etc.
for HTFL in `cat htfile.txt`
do
        cat $HTFL
        sed -f changetags.sed $HTFL > $HTFL.2
# here is where you would copy $HTFL.2 over $HTFL if you like.
        cat $HTFL.2
done

---changetags.sed---

# James Zuelow 5/6 August 2002
# See O'Reilly "Sed & Awk" (ISBN 1-56592-225-5) pp. 121-122
/.*<[Aa] [^>]*>.*/{
h
s/.*<[Aa] \([^>]*\)>.*/\1/
y/ABCDEFGHIJKLMNOPQRSTUVWXYZ/abcdefghijklmnopqrstuvwxyz/
G
s/\([^>]*\)\n\(.*<[Aa] \)[^>]*\(>.*\)/\2\1\3/
}


------------------------------------
This is the Juneau-LUG mailing list.
To unsubscribe, send an e-mail to juneau-lug-request@xxxxxxxxxxxxx with the 
word unsubscribe in the subject header.

Other related posts:

  • » [juneau-lug] recursively changing the case of hyperlinks in html documents