Team LiB
Previous Section Next Section

#86 Exploring the Apache error_log

Just as Script #84, Exploring the Apache access_log, reveals the interesting and useful statistical information found in the regular access log of an Apache or Apache-compatible web server, this script extracts the critical information from the error_log.

For those web servers that don't automatically split their log file into separate access_log and error_log components, you can sometimes split a central log file into access and error components by filtering based on the return code (field 9) of each entry in the log:

awk '{if (substr($9,0,1) <= "3") { print $0 } }' apache.log > access_log
awk '{if (substr($9,0,1)  > "3") { print $0 } }' apache.log > error_log

A return code that begins with a 4 or a 5 is a failure (the 400s are client errors, the 500s are server errors), and a return code beginning with a 2 or a 3 is a success (the 200s are success messages, the 300s are redirects):

Other servers that produce a single central log file containing both successes and errors denote the error message entries with an [error] field value. In that case, the split can be done with a grep '[error]' to create the error_log and a grep -v '[error]' to create the access_log.

Whether your server automatically creates an error_log or you have to create your own error log by searching for entries with the '[error]' string, in the error log just about everything is different, including the way the date is specified:

$ head -1 error_log
[Thu Jan  2 10:07:07 2003] [error] [client 208.180.31.244] File does
not exist: /usr/local/etc/httpd/htdocs/intuitive/favicon.ico

In the access_log, dates are specified as a compact one-field value with no spaces, but the error_log takes five fields instead. Further, rather than a consistent scheme in which the word/string position in a space-delimited entry consistently identifies a particular field, entries in the error_log have a meaningful error description that varies in length. An examination of just those description values reveals surprising variation:

$ awk '{print $9" "$10" "$11" "$12 }' error_log | sort -u
File does not exist:
Invalid error redirection directive:
Premature end of script
execution failure for parameter
premature EOF in parsed
script not found or
malformed header from script

Some of these errors should be examined by hand because they can be difficult to track backward to the offending web page once identified. Others are just transient problems:

[Thu Jan 16 20:03:12 2003] [error] [client 205.188.209.101] (35)
Resource temporarily unavailable: couldn't spawn include command
/usr/home/taylor/web/intuitive/library/header.cgi: Cannot fork:
Resource temporarily unavailable

This script focuses on the most common problems — in particular, File does not exist errors — and then produces a dump of all other error_log entries that don't match well-known error situations.

The Code

#!/bin/sh

# weberrors - Scans through an Apache error_log file and reports the
#    most important errors, then lists additional entries.

temp="/tmp/$(basename $0).$$"

# The following three lines will need to be customized for your own
# installation for this script to work best.

htdocs="/usr/local/etc/httpd/htdocs/"
myhome="/usr/home/taylor/"
cgibin="/usr/local/etc/httpd/cgi-bin/"

sedstr="s/^/  /g;s|$htdocs|[htdocs]  |;s|$myhome|[homedir] |;s|$cgibin|[cgi-bin] |"

screen="(File does not exist|Invalid error redirect|premature EOF|Premature end of
script|script not found)"

length=5                # entries per category to display
checkfor()
{
  grep "${2}:" "$1" | awk '{print $NF}' |\
    sort | uniq -c | sort -rn | head -$length | sed "$sedstr" > $temp

  if [ $(wc -l < $temp) -gt 0 ] ; then
    echo ""
    echo "$2 errors:"
    cat $temp
  fi
}

trap "/bin/rm -f $temp" 0

if [ "$1" = "-l" ] ; then
  length=$2; shift 2
fi

if [ $# -ne 1 -o ! -r "$1" ] ; then
  echo "Usage: $(basename $0) [-l len] error_log" >&2
  exit 1
fi

echo Input file $1 has $(wc -l < "$1") entries.

start="$(grep -E '\[.*:.*:.*\]' "$1" | head -1 | awk '{print $1" "$2" "$3" "$4" "$5
}')"
end="$(grep -E '\[.*:.*:.*\]' "$1" | tail -1 | awk '{print $1" "$2" "$3" "$4" "$5
}')"
echo -n "Entries from $start to $end"

echo ""

### Check for various common and well-known errors:

checkfor "$1" "File does not exist"
checkfor "$1" "Invalid error redirection directive"
checkfor "$1" "premature EOF"
checkfor "$1" "script not found or unable to stat"
checkfor "$1" "Premature end of script headers"

grep -vE "$screen" "$1" | grep "\[error\]" | grep "\[client " | \
  sed 's/\[error\]/\`/' | cut -d\` -f2 | cut -d\ -f4- | \
  sort | uniq -c | sort -rn | sed 's/^/ /' | head -$length > $temp

if [ $(wc -l < $temp) -gt 0 ] ; then
  echo ""
  echo "Additional error messages in log file:"
  cat $temp
fi
echo ""
echo "And non-error messages occurring in the log file:"

grep -vE "$screen" "$1" | grep -v "\[error\]" | \
  sort | uniq -c | sort -rn | \
  sed 's/^/ /' | head -$length

exit 0

How It Works

This script works by scanning the error_log for the five errors specified in the calls to the checkfor function, extracting the last field on each error line with an awk call for $NF (NF represents the number of fields in that particular input line). This output is then fed through the common sort | uniq -c | sort -rn sequence to allow the extraction of the most commonly occurring errors for that category of problem.

To ensure that only those error types with matches are shown, each specific error search is saved to the temporary file, which is then tested for contents before a message is output. This is all neatly done with the checkfor function that appears near the top of the script.

The last few lines of the script are perhaps the most complex. First they identify the most common errors not otherwise checked for by the script that are still in standard Apache error log format. The following grep invocations are part of a longer pipe:

grep -vE "$screen" "$1" | grep "\[error\]"

Then the script identifies the most common errors not otherwise checked for by the script that don't occur in standard Apache error log format. Again, the following grep invocations are part of a longer pipe:

grep -vE "$screen" "$1" | grep -v "\[error\]"

Running the Script

This script should be fed a standard Apache-format error log as its only argument. If invoked with an -l length argument, it'll display length number of matches per error type checked rather than the default of five entries per error type.

The Results

$ weberrors error_log
Input file error_log has 1040 entries.
Entries from [Sat Aug 23 18:10:21 2003] to [Sat Aug 30 17:23:38 2003]
File does not exist errors:
    24 [htdocs]  intuitive/coolweb/Graphics/Graphics/off.gif
    19 [htdocs]  intuitive/taylor/Graphics/biohazard.gif
    19 [homedir] public_html/tyu/tyu-toc.html
    14 [htdocs]  intuitive/Graphics/bottom-menu.gif
    12 [htdocs]  intuitive/tmp/rose-ceremony/spacer.gif

Invalid error redirection directive errors:
    23 index.html

script not found or unable to stat errors:
    55 [htdocs]  intuitive/coolweb/apps/env.cgi
     4 [htdocs]  intuitive/cgi-local/apps/env.cgi
     4 [cgi-bin] FormMail.pl
     3 [htdocs]  intuitive/origins/playgame.cgi

Additional error messages in log file:
     5 (35)Resource temporarily unavailable: couldn't spawn include command
     4 unknown parameter "src" to tag include in
/usr/local/etc/httpd/htdocs/intuitive/tmp/ECR0803b.shtml
     4 execution failure for parameter "cmd" to tag exec in file
/usr/local/etc/httpd/htdocs/intuitive/library/footer.shtml
     1 execution failure for parameter "cmd" to tag exec in file
/usr/local/etc/httpd/htdocs/intuitive/library/WindWillows.shtml

And non-error messages occurring in the log file:
    39 /usr/home/taylor/web/intuitive/library/header.cgi: Cannot fork: Resource
temporarily unavailable
    20 identify: Missing an image file name.
    17 sort: -: write error: Broken pipe
    16 /web/bin/lastmod: not found
    16 /web/bin/counter: not found

Team LiB
Previous Section Next Section
This HTML Help has been published using the chm2web software.