So far, we have seen how to create simple services via shell scripts and how to associate them with templates so that they get executed on new connections. Unfortunately, this approach has a performance drawback: Every connection to a service would cause Honeyd to execute the corresponding script as a new process. If Honeyd is expected to received hundreds of connections at the same time, this would create hundreds of processes and could slow down the system noticeably.
addition= "add" template-name "subsystem" cmd-string ["shared"] ["restart"] |
Honeyd supports another way to run services that have much better performance. These services are called subsystems because they run as a single process constantly in the background and can handle multiple connections at once. Depending on your application, a subsystem can be more than a hundred times faster than a script-based service. You might expect this to be very complicated, but a subsystem is really just a regular Unix application that gets started by Honeyd and runs in Honeyd's virtual address space. In theory, any Unix application that supports networking can be used as a subsystem for Honeyd.
Before we discuss subsystems in more detail, let us take a quick look at the advantages that subsystems promise to have over service scripts:
Subsystems do not require one process per connection. Instead, a subsystem just requires one process per template or, in the case of sharing one process, per template group. As a result, their performance and scalability is much higher.
Subsystems can initiate their own connections and may be used to simulate network activity from your virtual honeypots.
Almost any Unix networking application can be used as subsystem. In most cases, that means you do not have to write your own service emulation but can use existing applications instead.
Let's look at a simple example to conceptualize this idea. We would like to provide a web server for our honeypots and might already have configured an unallocated C-class network with templates for 100 or so virtual honeypots. If we wanted to run a web server like thttpd on our honeypots, we could run it as a shared subsystem. Once thttpd starts to listen for requests on port 80, all your honeypots sharing the subsystem would respond to HTTP requests.
An example configuration of a shared subsystem is shown in Figure 5.3. The shared flag means that the application — thttpd in this example — is going to be started only once and that all templates who inherit from the base template are going to be available to the web server.
create base add base subsystem "thttpd -d /var/www/" shared restart clone host1 base set host1 personality "Linux 2.4.18" ... clone host121 base set host121 personality "NetBSD 1.6" bind 10.1.0.2 host1 bind 10.1.0.3 host2 ... bind 10.1.0.223 host121 |
It is also possible to start a single process for each IP address that Honeyd emulates. To achieve this, we just omit the shared flag. However, shared subsystems have the advantage of requiring less processes, and as result, their performance is often superior. On the other hand, a single process decreases the stability of the system. If the web server process were to die, all connections that are currently established to it would be terminated. To improve the stability in such a scenario, we could use multiple shared subsystems and group the templates so that they do not all rely on the same process. If one of the web server processes were to die, then only the connections associated with it would suffer, while the connections to the remaining processes would be unaffected.
Another advantage of running one subsystem per IP address is that there is no confusion over which IP address to use for outgoing connections. You might want to initiate outgoing connections to simulate traffic originating from a honeypot, such as outbound web surfing, or perhaps even FTP data connections. A subsystem that is shared across a few hundred IP addresses does not know which IP address it should use when making a connection, so Honeyd just chooses an IP address. In some cases, this might allow an adversary to guess that he is not dealing with real machines. A solution to this problem is discussed in Section 5.3.1. With a little bit of coding, it is possible to allow shared subsystems to choose the IP address from which they initiate connections.
Using the restart flag, it is possible to detect if a subsystem fails and automatically restart the process. Honeyd has some checks built into it to prevent a process that crashes all the time from being restarted too often. In most cases, you probably want to use both the shared and the restart flag together.
One valid concern is that running a shared subsystems for all your honeypots might create a service monoculture because every single IP address would run the same subsystem. Fortunately, it is possible to configure ports via regular configuration language:
clone host3 base # we do not want a webserver for this IP add host3 tcp port 80 reset |
In this case, the web server would not be able to listen to port 80 on host3. If all your virtual honeypots had a specific configuration for port 80, the web server would fail with the error message that the port is already in use. But as long as there is at least one IP address that the web server can listen to, everything should work as described.
Subsystems are implemented in Honeyd by using dynamic library preloading. Usually, when you run a Unix application, it will call functions in libc to establish network connections and to listen to TCP ports and so on. However, when such an application is run as a Honeyd subsystem, Honeyd substitutes the networking functions with its own code. So instead of talking to the operating system kernel, the application ends up talking to Honeyd. This is completely transparent to the application and essentially allows most Unix networking applications to run under Honeyd. The main exception are programs that have been linked statically because they do not use dynamic libraries and networking code that create packets at the lowest layer of the operating system.
Subsystems have the additional benefit that you can use them to implement more complicated protocols. The FTP is one example of a protocol that cannot be implemented via a service script. The reason for this is that FTP uses a data channel that is separate from the control channel. When you log into an FTP server and type the get command, the FTP server dynamically allocates a new port to exchange data. For a script-based service, it is not possible to request a new port from Honeyd or change Honeyd's configuration on the fly. For a subsystem, this is different: Because it has direct access to Honeyd's virtual name space, opening a new port is simple.
One advantage of subsystems is their ability to initiate connections from the honeypots. If you suspect that somebody is monitoring the network activity to your honeypots, it might be very suspicious if he only receives connections and never initiates any. Figure 5.4 shows a simple example for creating some more natural background activity. As you can see, it's even possible to run shell scripts as subsystems. With some tricks and using programs like nc,[1] it's possible to even get a shell on the virtual honeypots and use it to experiment with many network applications.
[1] nc stands for netcat, and it is installed by default on most operating systems.
#!/bin/lsh
cd /tmp
for url in http://slashdot.org/ http://www.cnn.com/
do
sleep $(( $RANDOM 150 ))
wget $url
done |
In the following, we are going to describe some very technical information for anyone who plans to write her own subsystem software. If you do not expect to be doing so, you might want to skip this discussion.
The preceding FTP example is a case of where shared subsystems can be fine-tuned. The FTP daemon is unlikely to know that it is running under Honeyd, and when it allocates a data channel to a new port, it will end up opening the port on all virtual honeypots, which clearly is not what we want.
Ideally, we would allocate the port only on the IP address that received the FTP connection, which leads us to the problem that a Unix application needs to know which IP address was used to contact it. Honeyd solves this problem by intercepting the getsockname function, which can be used to get information about the local IP address that received the connection.
struct sockaddr_storage ss, lss; socklen_t addrlen = sizeof(ss), laddrlen = sizeof(lss); if ((nfd = accept(fd, (struct sockaddr *)&ss, &addrlen)) == -1) { fprintf(stderr, "%s: bad accept\n", __func__); return; } res = getsockname(fd, &lss, &laddrlen); if (res == -1) fprintf(stderr, "Cannot get local address.\n"); |
After running getsockname, the local IP address and port are stored in
struct sockaddrstorage lss
With knowledge of the local IP address, it is now possible to bind a socket to the correct IP address before making a connection. The results are going to be much more realistic than letting Honeyd make the choice of which IP address to use.