8.3. Remote LoggingLogging to the local filesystem on the same server is fine when it is the only server you have. Things get complicated as the number of servers rises. You may find yourself in one or more of the following situations:
The solution is usually to introduce a central logging host to the system, but there is no single ideal solution. I cover several approaches in the following sections. 8.3.1. Manual CentralizationThe most natural way to centralize logs is to copy them across the network using the tools we already have, typically FTP, Secure File Transfer Program (SFTP), part of the Secure Shell package, or Secure Copy (SCP), also part of the SSH package. All three can be automated. As a bonus, SFTP and SCP are secure and allow us to transfer the logs safely across network boundaries. This approach is nice, secure (assuming you do not use FTP), and simple to configure. Just add the transfer script to cron, allowing enough time for logs to be rotated. The drawback of this approach is that it needs manual configuration and maintenance and will not work if you want the logs placed on the central server in real time. 8.3.2. Syslog LoggingLogging via syslog is the default approach for most system administrators. The syslog protocol (see RFC 3164 at http://www.ietf.org/rfc/rfc3164.txt) is simple and has two basic purposes:
Since all Unix systems come with syslog preinstalled, it is fairly easy to start using it for logging. A free utility, NTsyslog (http://ntsyslog.sourceforge.net), is available to enable logging from Windows machines. The most common path a message will take starts with the application, through the local daemon, and across the network to the central logging host. Nothing prevents applications from sending UDP packets across the network directly, but it is often convenient to funnel everything to the localhost and decide what to do with log entries there, at a single location. Apache supports syslog logging directly only for the error log. If the special keyword syslog is specified, all error messages will go to the syslog: ErrorLog syslog:facility The facility is an optional parameter, but you are likely to want to use it. Every syslog message consists of three parts: priority, facility, and the message. Priority can have one of the following eight values: debug, info, notice, warning, error, crit, alert, and emerg. Apache will set the message priority according to the seriousness of the message. Message facility is of interest to us because it allows messages to be grouped. Possible values for facility are the following: auth, authpriv, cron, daemon, kern, lpr, mail, mark, news, security, syslog, user, uucp, and local0 tHRough local7. You can see many Unix legacy names on the list. Local facilities are meant for use by user applications. Because we want only Apache logs to go to the central server, we will choose an unused facility: ErrorLog syslog:local4 We then configure syslog to single out Apache messages (that is, those with facility local4) and send them to the central logging host. You need to add the following lines at the bottom of /etc/syslog.conf (assuming the central logging host occupies the address 192.168.0.99): # Send web server error messages to the central host local4.*: 192.168.0.99 At the remote server, the following addition to /etc/syslog.conf makes local4 log entries go into a single file: local4.*: /var/www/logs/access_log
To send access log entries to syslog, you must use piped logging. One way of doing this is through the logger utility (normally available on every Unix system): CustomLog "|/usr/bin/logger -p local5.info" combined I have used the -p switch to assign the priority and the facility to the syslog messages. I have also used a different facility (local5) for the access log to allow syslog to differentiate the access log messages from the error log messages. If more flexibility is needed, send the logs to a simple Perl script that processes them and optionally sends them to syslog. You can write your own script using the skeleton code given in this chapter, or you can download, from this book's web site, the one I have written. Not everyone uses syslog, because the syslog transport protocol has three drawbacks:
On top of all this, the default daemon (syslogd) is inadequate for anything but the simplest configurations. It supports few transport modes and practically no filtering options. Attempts have been made to improve the protocol (RFC 3195, for example) but adoption of such improvements has been slow. It seems that most administrators who decide on syslog logging choose to resolve the problems listed above by using Syslog-NG (http://www.balabit.com/products/syslog_ng/) and Stunnel (http://www.stunnel.org). Syslog-NG introduces reliable logging via TCP, which is nonstandard but does the job when Syslog-NG is used on all servers. Adding Stunnel on top of that solves the authentication and confidentiality problems. The combination of these two programs is the recommended solution for automated, reliable, and highly secure logging. Chapter 12 of Linux Server Security by Michael D. Bauer, which covers system log management and monitoring and includes detailed coverage of Syslog-NG, is available for free download from O'Reilly (http://www.oreilly.com/catalog/linuxss2/ch12.pdf). 8.3.3. Database LoggingRemember how I said that some developers do not believe the web server should be wasting its time with logging? Well, some people believe in the opposite. A third-party module, mod_log_sql, adds database-logging capabilities to Apache. The module supports MySQL, and support for other popular databases (such as PostgreSQL) is expected. To obtain this module, go to http://www.outoforder.cc/projects/apache/mod_log_sql. The module comes with comprehensive documentation and I urge you to read through it before deciding whether to use the module. There are many reasons to choose this type of logging but there are also many reasons against it. The advantage of having the logs in the database is you can use ad-hoc queries to inspect the data. If you have a need for that sort of thing, then go for it. After you configure the database to allow connections from the web server, the change to the Apache configuration is simple: # Enable the required modules LoadModule log_sql_module modules/mod_log_sql.so LoadModule log_sql_mysql_module modules/mod_log_sql_mysql.so # The location of the database where logs will be stored LogSQLLoginInfo mysql://user;pass@192.168.0.99/apachelogs # Automatically create tables in the database LogSQLCreateTables on # The name of the access_log table LogSQLTransferLogTable access_log # Define what is logged to the database table LogSQLTransferLogFormat AbHhmRSsTUuv After restarting the server, all your logs will go into the database. I find the idea of putting the logs into a database very interesting, but it also makes me uneasy; I am not convinced this type of data should be inserted into the database in real-time. mod_log_sql is a fast module, and it achieves good performance by having each child open its own connection to the database. With the Apache process model, this can turn into a lot of connections. Another drawback is that you can create a central bottleneck out of the database logging server. After all, a web server can serve pages faster than any database can log them. Also, none of the web statistics applications can access the data in the database, and you will have to export the logging data as text files to process it. The mod_log_sql module comes with a utility for doing this export. Though I am not quite convinced this is a good solution for all uses, I am intrigued by the possibility of using database logging only for security purposes. Continue logging to files and log only dynamic requests to the database: LogSQLRequestAccept .html .php With this restriction, the load on the database should be a lot smaller. The volume of data will also be smaller, allowing you to keep the information in the database longer. 8.3.4. Distributed Logging with the Spread ToolkitEvery once in a while, one encounters a technology for which the only word to describe it is "cool." This is the case with the Spread Toolkit (http://www.spread.org), a reliable messaging toolkit. Specifically, we are interested in one application of the toolkit, mod_log_spread (http://www.backhand.org/mod_log_spread/). The Spread Toolkit is cool because it allows us to create rings of servers that participate in reliable conversation. It is not very difficult to set up, and it almost feels like magic when you see the effects. Though Spread is a generic messaging toolkit, it works well for logs, which are, after all, only messages. Though the authors warn about complexity, the installation process is easy provided you perform the steps in the correct order:
In our example Spread configuration, we will have four instances of spread, three web servers with mod_log_spread running and one instance of spreadlogd. We specify the ring of machines using their names and IP addresses in the spread.conf file: Spread_Segment 192.168.0.255:4803 { www1 192.168.0.1 www2 192.168.0.2 www3 192.168.0.3 loghost 192.168.0.99 } In the Apache configuration on each web server, we let the modules know the port the Spread daemon is listening on. We send the logs to a spread group called access: SpreadDaemon 4803 CustomLog $access vcombined The purpose of the spreadlogd daemon is to collect everything sent to the access group into a file. The configuration (spreadlogd.conf) is self-explanatory: BufferSize = 65536 Spread { Port = 4803 Log { RewriteTimestamp = CommonLogFormat Group = access File = access_log } } With this configuration in place, the three web servers send their logs to the Spread ring over the network. All members of the ring receive all messages, and the group names are used to differentiate one class of messages from another. One member of the ring is the logging daemon, and it writes the logs into a single file. The problem of cluster logging is elegantly solved. The beauty of Spread is its flexibility. I have used only one logging group in the configuration above, but there can be any number of groups, each addressed to a different logging daemon. And it is not required to have only one logging daemon; two or more such daemons can be configured to log the same group, providing redundancy and increasing availability. On top of all this, the authors mention speed improvements in the range of 20 to 30 percent for busy web servers. Though Spread does offer virtual hosting support, it does not work well with a large number of messaging groups. I do not see this as a problem since a sensible logging strategy is to use a logging format where the hostname is a part of the logging entry, and split logs into per-virtual host files on the logging server. The module does not support error logging (because it cannot be done on Apache 1 without patching the core of the server) but a provided utility script error_log_spread.pl can be used, together with piped logging. mod_log_spread only works with Apache 1 at the moment. This is not a problem since we have the piped logging route as a choice. Besides, as just mentioned, mod_log_spread does not support error logging, so you would have to use piped logging on a production system anyway. To support Apache 2, I have slightly improved the error_log_spread.pl utility script, adding a -c switch to force a copy of the logs to be stored on a local filesystem. This is necessary because error logs are often needed there on the server for diagnostic purposes. The switch makes sense only when used for the error log: CustomLog "|/usr/local/apache/bin/log_spread.pl -g access" vcombined ErrorLog "|/usr/local/apache/bin/log_spread.pl -g error -c /var/www/ logs/error_log" |