[juneau-lug] recursively changing the case of hyperlinks in html documents
- From: "James Zuelow" <jamesz@xxxxxxxxxxxxxxxx>
- To: "Juneau-Lug" <juneau-lug@xxxxxxxxxxxxx>
- Date: Tue, 6 Aug 2002 16:19:15 -0800
These go along with the previous script I posted. Once you rename your
files & directories, your existing hyperlinks may not be valid anymore on a
*nix server. (Index.html is not index.html)
So this paired shell script and sed script will rewrite <A (everything
here)> to all lowercase. It will leave other HTML tags alone. You could
expand the regexp to be <[Aa] [Hh][Rr][Ee][Ff] to get JUST the hyperlinks if
you like, or remove the [Aa] to get all tags and any text that happens to be
in <> brackets.
Note that in the current form, it does not test the directory input supplied
by the user (see previous script) and you have to manually replace your
existing *.html or *.htm with the new lowercase link *.html.2 or *.htm.2
files. This script is mostly copied out of the O'Reilly Sed & Awk book - I
just wrote the regexps.
Cheers,
James
---changetags---
# James Zuelow 5/6 Aug 2002
# changetags.sed needs to be in the same directory
#!/bin/sh
echo -n "Enter top level directory (i.e. /var/www): "
read TL
# here is where you would test for a valid directory
find $TL -name *.htm > htfile.txt
find $TL -name *.html > htfile.txt
# note that this does not account for shtml, etc.
for HTFL in `cat htfile.txt`
do
cat $HTFL
sed -f changetags.sed $HTFL > $HTFL.2
# here is where you would copy $HTFL.2 over $HTFL if you like.
cat $HTFL.2
done
---changetags.sed---
# James Zuelow 5/6 August 2002
# See O'Reilly "Sed & Awk" (ISBN 1-56592-225-5) pp. 121-122
/.*<[Aa] [^>]*>.*/{
h
s/.*<[Aa] \([^>]*\)>.*/\1/
y/ABCDEFGHIJKLMNOPQRSTUVWXYZ/abcdefghijklmnopqrstuvwxyz/
G
s/\([^>]*\)\n\(.*<[Aa] \)[^>]*\(>.*\)/\2\1\3/
}
------------------------------------
This is the Juneau-LUG mailing list.
To unsubscribe, send an e-mail to juneau-lug-request@xxxxxxxxxxxxx with the
word unsubscribe in the subject header.
Other related posts:
- » [juneau-lug] recursively changing the case of hyperlinks in html documents