Large, busy websites like Yahoo! operate a number of mirrors, separate servers that are functionally identical to the main site but are running on different hardware. While it's unlikely that you can duplicate all of their fancy setup, the basic mirroring of a website isn't too difficult with a shell script or two.
The first step is to automatically pack up, compress, and transfer a snapshot of the master website to the mirror server. This is easily done with the remotebackup script shown in Script #87, invoked nightly by cron.
Instead of sending the archive to your own mail address, however, send it to a special address named unpacker, then add a sendmail alias in /etc/aliases (or the equivalent in other mail transport agents) that points to the unpacker script given here, which then unpacks and installs the archive:
unpacker:"|/home/taylor/bin/archive-unpacker"
You'll want to ensure that the script is executable and be sensitive to what applications are in the default PATH used by sendmail: The /var/log/messages log should reveal whether there are any problems invoking the script as you debug it.
#!/bin/sh # unpacker - Given an input stream with a uuencoded archive from # the remotearchive script, unpacks and installs the archive. temp="/tmp/$(basename $0).$$" home="${HOME:-/usr/home/taylor}" mydir="$home/archive" webhome="/usr/home/taylor/web" notify="taylor@intuitive.com" ( cat - > $temp # shortcut to save stdin to a file target="$(grep "^Subject: " $temp | cut -d\ -f2-)" echo $(basename $0): Saved as $temp, with $(wc -l < $temp) lines echo "message subject=\"$target\"" # Move into the temporary unpacking directory... if [ ! -d $mydir ] ; then echo "Warning: archive dir $mydir not found. Unpacking into $home" cd $home mydir=$home # for later use else cd $mydir fi # Extract the resultant filename from the uuencoded file... fname="$(awk '/^begin / {print $3}' $temp)" uudecode $temp if [ ! -z "$(echo $target | grep 'Backup archive for')" ] ; then # All done. No further unpacking needed. echo "Saved archive as $mydir/$fname" exit 0 fi # Otherwise, we have a uudecoded file and a target directory if [ "$(echo $target|cut -c1)" = "/" -o "$(echo $target|cut -c1-2)" = ".." ] then echo "Invalid target directory $target. Can't use '/' or '..'" exit 0 fi targetdir="$webhome/$target" if [ ! -d $targetdir ] ; then echo "Invalid target directory $target. Can't find in $webhome" exit 0 fi gunzip $fname fname="$(echo $fname | sed 's/.tgz$/.tar/g')" # Are the tar archive filenames in a valid format? if [ ! -z "$(tar tf $fname | awk '{print $8}' | grep '^/')" ] ; then echo "Can't unpack archive: filenames are absolute." exit 0 fi echo "" echo "Unpacking archive $fname into $targetdir" cd $targetdir tar xvf $mydir/$fname | sed 's/^/ /g' echo "done!" ) 2>&1 | mail -s "Unpacker output $(date)" $notify exit 0
The first thing to notice about this script is that it is set up to mail its results to the address specified in the notify variable. While you may opt to disable this feature, it's quite helpful to get a confirmation of the receipt and successful unpacking of the archive from the remote server. To disable the email feature, simply remove the wrapping parentheses (from the initial cat to the end of the script), the entire last line in which the output is fed into the mail program, and the echo invocations throughout the script that output its status.
This script can be used to unpack two types of input: If the subject of the email message is a valid subdirectory of the webhome directory, the archive will be unpacked into that destination. If the subject is anything else, the uudecoded, but still compressed (with gzip), archive will be stored in the mydir directory.
One challenge with this script is that the file to work with keeps changing names as the script progresses and unwraps/unpacks the archive data. Initially, the email input stream is saved in $temp, but when this input is run through uudecode, the extracted file has the same name as it had before the uuencode program was run in Avoiding Disaster with a Remote Archive, Script #87. This new filename is extracted as fname in this script:
fname="$(awk '/^begin / {print $3}' $temp)"
Because the tar archive is compressed, $fname is something.tgz. If a valid subdirectory of the main web directory is specified in the subject line of the email, and thus the archive is to be installed, the value of $fname is modified yet again during the process to have a .tar suffix:
fname="$(echo $fname | sed 's/.tgz$/.tar/g')"
As a security precaution, unpacker won't actually unpack a tar archive that contains filenames with absolute paths (a worst case could be /etc/passwd: You really don't want that overwritten because of an email message received!), so care must be taken when building the archive on the local system to ensure that all filenames are relative, not absolute. Note that tricks like ../../../../etc/passwd will be caught by the script test too.
Because this script is intended to be run from within the lowest levels of the email system, it has no parameters and no output: All output is sent via email to the address specified as notify.
The results of this script aren't visible on the command line, but we can look at the email produced when an archive is sent without a target directory specified:
archive-unpacker: Saved as /tmp/unpacker.38198, with 1081 lines message subject="Backup archive for Wed Sep 17 22:48:11 GMT 2003" Saved archive as /home/taylor/archive/backup.030918.tgz
When a target directory is specified but is not available for writing, the following error is sent via email:
archive-unpacker: Saved as /tmp/unpacker.48894, with 1081 lines message subject="mirror" Invalid target directory mirror. Can't find in /web
And finally, here is the message sent when everything is configured properly and the archive has been received and unpacked:
archive-unpacker: Saved as /tmp/unpacker.49189, with 1081 lines message subject="mirror" Unpacking archive backup.030918.tar into /web/mirror ourecopass/ ourecopass/index.html ourecopass/nq-map.gif ourecopass/nq-map.jpg ourecopass/contact.html ourecopass/mailform.cgi ourecopass/cgi-lib.pl ourecopass/lists.html ourecopass/joinlist.cgi ourecopass/thanks.html ourecopass/thanks-join.html done!
Sure enough, if we peek in the /web/mirror directory, everything is created as we hoped:
$ ls -Rs /web/mirror total 1 1 ourecopass/ /web/mirror/ourecopass: total 62 4 cgi-lib.pl 2 lists.html 2 thanks-join.html 2 contact.html 2 mailform.cgi* 1 thanks.html 2 index.html 20 nq-map.gif 2 joinlist.cgi* 26 nq-map.jpg
This HTML Help has been published using the chm2web software. |