8.2. Log ManipulationApache does a good job with log format definition, but some features are missing, such as log rotation and log compression. Some reasons given for their absence are technical, and some are political:
Of course, nothing prevents third-party modules from implementing any kind of logging functionality, including rotation. After all, the default logging is done through a module ( mod_log_config) without special privileges. However, at the time of this writing no modules exist that log to files and support rotation. There has been some work done on porting Cronolog (see Section 8.2.2.2 in the Section 8.2.2 section) to work as a module, but the beta version available on the web site has not been updated recently. 8.2.1. Piped LoggingPiped logging is a mechanism used to offload log manipulation from Apache and onto external programs. Instead of giving a configuration directive the name of the log file, you give it the name of a program that will handle logs in real time. A pipe character is used to specify this mode of operation: CustomLog "|/usr/local/apache/bin/piped.pl /var/www/logs/piped_log" combined All logging directives mentioned so far support piped logging. Many third-party modules also try to support this way of logging. External programs used this way are started by the web server and restarted later if they die. They are started early, while Apache is still running as root, so they are running as root, too. Bugs in these programs can have significant security consequences. If you intend to experiment with piped logging, you will find the following proof-of-concept Perl program helpful to get you started: #!/usr/bin/perl use IO::Handle; # check input parameters if ((!@ARGV)||($#ARGV != 0)) { print "Usage: piped.pl <log filename>\n"; exit; } # open the log file for appending, configuring # autoflush to avoid potential data loss $logfile = shift(@ARGV); open(LOGFILE, ">>$logfile") || die "Failed to open $logfile for writing"; LOGFILE->autoflush(1); # handle log entries until the end while (my $logline = <STDIN>) { print LOGFILE $logline; } close(LOGFILE); If you prefer C to Perl, every Apache distribution comes with C-based piped logging programs in the support/ folder. Use these programs for skeleton source code. Though the piped logging functionality serves the purpose of off-loading the logging task to an external program, it has some drawbacks:
8.2.2. Log RotationBecause no one has unlimited storage space available, logs must be rotated on a regular basis. No matter how large your hard disk, if you do not implement log rotation, your log files will fill the partition. Log rotation is also very important to ensure no loss of data. Log data loss is one of those things you only notice when you need the data, and then it is too late. There are two ways to handle log rotation:
8.2.2.1 Periodic rotationThe correct procedure to rotate a log from a script is:
Here is the same procedure given in a shell script, with the added logic to keep several previous log files at the same location: #!/bin/sh cd /var/www/logs mv access_log.3.gz access_log.4.gz mv access_log.2.gz access_log.3.gz mv access_log.1.gz access_log.2.gz mv access_log accesss_log.1 /usr/local/apache/bin/apachectl graceful sleep 600 gzip access_log.1 Without the use of piped logging, there is no way to get around restarting the server; it has to be done for it to re-open the log files. A graceful restart (that's when Apache patiently waits for a child to finish with the request it is processing before it shuts it down) is recommended because it does not interrupt request processing. But with a graceful restart, the wait in step 3 becomes somewhat tricky. An Apache process doing its best to serve a client may hang around for a long time, especially when the client is slow and the operation is long (e.g., a file download). If you proceed to step 4 too soon, some requests may never be logged. A waiting time of at least 10 minutes is recommended. Many Linux distributions come with a utility called logrotate, which can be used to rotate all log files on a machine. This handy program takes care of most of the boring work. To apply the Apache log rotation principles to logrotate, place the configuration code given below into a file /etc/logrotate.d/apache and replace /var/www/logs/* with the location of your log files, if different: /var/www/logs/* { # rotate monthly monthly # keep nine copies of the log rotate 9 # compress logs, but with a delay of one rotation cycle compress delaycompress # restart the web server only once, not for # every log file separately sharedscripts # gracefully restart Apache after rotation postrotate /usr/local/apache/bin/apachectl graceful > /dev/null 2> /dev/null endscript } Use logrotate with the -d switch to make it tell you what it wants to do to log files without doing it. This is a very handy tool to verify logging is configured properly. 8.2.2.2 Real-time rotationThe rotatelogs utility shipped with Apache uses piped logging and rotates the file after a specified time period (given in seconds) elapses: CustomLog "|/usr/local/apache/bin/rotatelogs /var/www/logs/access_log 300" custom The above rotates the log every five minutes. The rotatelogs utility appends the system time (in seconds) to the log name to keep filenames unique. For the configuration directive given above, you will get filenames such as these: access_log.1089207300 access_log.1089207600 access_log.1089207900 ... Alternatively, you can use strftime-compatible (see man strftime) format strings to create a custom log filename format. The following is an example of automatic daily log rotation: CustomLog "|/usr/local/apache/bin/rotatelogs \ /var/www/logs/access_log.%Y%m%d 86400" custom Similar to rotatelogs, Cronolog (http://cronolog.org) has the same purpose and additional functionality. It is especially useful because it can be configured to keep a symbolic link to the latest copy of the logs. This allows you to find the logs quickly without having to know what time it is. CustomLog "|/usr/local/apache/bin/cronolog \ /var/www/logs/access_log.%Y%m%d --link=/var/www/logs/access_log" custom A different approach is used in Cronolog to determine when to rotate. There is no need to specify the time period. Instead, Cronolog rotates the logs when the filename changes. Therefore, it is up to you to design the file format, and Cronolog will do the rest. 8.2.3. Issues with Log DistributionThere are two schools of thought regarding Apache log configurations. One is to use the CustomLog and ErrorLog directives in each virtual host container, which creates two files per each virtual host. This is a commonsense approach that works well but has two drawbacks:
To overcome these problems, the second school of thought regarding configuration was formed. The idea is to have only two files for all virtual hosts and to split the logs (creating one file per virtual host) once a day. Log post-processing can be performed just before the splitting. This is where the vcombined access log format comes into play. The first field on the log line, the hostname, is used to determine to which virtual host the entry belongs. But the problem is the format of the error log is fixed; Apache does not allow its format to be customized, and we have no way of knowing to which host an entry belongs. One way to overcome this problem is to patch Apache to put a hostname at the beginning of every error log entry. One such patch is available for download from the Glue Logic web site (http://www.gluelogic.com/code/apache/). Apache 2 offers facilities to third-party modules to get access to the error log so I have written a custom module, mod_globalerror, to achieve the same functionality. (Download it from http://www.apachesecurity.net/.) |