ch07.html

7. Hybrid Systems

7.1 Collapsar

7.2 Potemkin

7.3 RolePlayer

7.4 Research Summary

7.5 Building Your Own Hybrid Honeypot System

7.6 Summary

When low-interaction systems are not powerful enough and high-interaction systems are too expensive, hybrid solutions offer the benefits of both worlds. Let's say we want to capture real worms on a class B network under our control. It would be too expensive to set up 65,000 real machines, but by combining principals of low-interaction honeypots with high-interaction honeypots, we can use the low-interaction honeypots as gateways to a few high-interaction machines. The low-interaction honeypots filter out noise and scanning attempts and ensure that only interesting connections are forwarded to a set of high-interaction machines. These high-interaction machines can run different operating systems, and by selectively forwarding connections from the low-interaction honeypots, we can mix and dice the different services available on the high-interaction systems.

This chapter explains high-performance honeypot applications. Unfortunately, honeypots are governed by three contending goals: security, performance, and fidelity.^[1] By security, we mean that an adversary is well isolated from the real world and cannot cause collateral damage. Performance is an indicator of how much traffic a honeypot can handle or with how many adversaries it can interact at the same time. When applied to honeypots, fidelity means the realism provided by a honeypot to an adversary. A high-interaction honeypot based on a dedicated physical machine without any limitations to an adversary has the highest fidelity possible. Conversely, low-interaction honeypots usually exhibit low fidelity because they do not allow an adversary to completely interact with an operating system. The contention between security, performance, and fidelity means that it is difficult to have a honeypot that does well in all three areas at the same time. Usually, when trying to provide higher performance, the fidelity of a honeypot suffers and vice versa. In this chapter, we explain different techniques for making intelligent tradeoffs between these three areas.

^[1] These goals are very similar to the requirements for virtual machine monitors established by Popek and Goldberg in 1974 [65] See Section 9.2.4 for a more detailed discussion of them.

Many of the examples provided in this chapter assume honeypot deployments on at least multiple C-class networks. However, even if you do not have access to large address spaces yourself, you might still be curious about more advanced applications of honeypots. In any case, the underlying techniques and optimization will be helpful for you to understand how to get the most out of your own honeypot installations.

Although there are currently no open source hybrid solutions, researchers have published a number of interesting systems lately. To give you an idea of how powerful hybrid solutions can be, we present an overview of current research into hybrid honeypot systems and explain how some of these ideas can be achieved by combining existing honeypot technologies. Although, it would be quite challenging to create such high-performance hybrid systems ourselves, it is still possible to use technologies like NAT, low-interaction honeypots like Honeyd and high-interaction honeypots to create hybrid systems that perform well and still exhibit high fidelity. We explore some of potential solutions at the end of this chapter. To make the best use of the material in this chapter, be prepared to take apart open source code and do a lot of your own hacking.

In recent years, honeypots have received a lot of attention from the research community. We are going to give an overview of three relevant research papers that take three different approaches to deploying honeypots to a large number of IP addresses.

7.1. Collapsar

Collapsar is a virtual-machine-based architecture for network attack detection [43 ]. It was developed by Xuxian Jiang and Dongyan Xu from Purdue University. Their main motivation was to improve the coverage of honeypots by processing traffic from a large number of IP addresses that may be distributed across multiple networks. Getting traffic from more than one network is important because each network may be biased in the traffic it is receiving. Events that may be visible at one network may not be observable at another network. On the other hand, events that are observed by all networks might indicate large-scale activities over all of the Internet, whereas events that are observed only at a single network might be indicators of a targeted attack.

Figure 7.1 shows a high-level overview of the Collapsar architecture. It consists of five different components: traffic redirectors, a transparent firewall-like frontend, virtual-machine-based honeypots, a management station, and a correlation engine. The traffic redirectors are installed in different production networks, where as the rest of the infrastructure is consolidated in a Collapsar center.

Figure 7.1. This figure shows an overview of Collapsar's architecture. In each monitored network, redirectors tunnel a subset of IP addresses to the Collapsar frontend. The frontend in turn redirects traffic to a number of high-interaction honeypots running on virtual machines. The correlation engine is used to detect anomalies such as new attacks.

[View full size image]

Traffic redirectors are responsible for forwarding addresses in their network to Collapsar's frontend. The traffic is redirected via GRE tunnels. In the simplest case of redirection, a router is configured to GRE tunnel a part of a production network over to Collapsar. Another traffic redirection approach is to selectively tunnel certain IP addresses, similar to Honeyd's routing topologies described in Section 5.6 . Tunneling traffic to a central location makes management and detection of attacks much easier because all the infrastructure and data are in one place. However, tunneling packets also adds latency to network connections that might be detectable by an adversary. In particular, if IP addresses are selectively tunneled, their latency could be very different from neighboring untunneled addresses. For large-scale detection of activity, the increased latency is probably an acceptable complication. In Collapsar, the redirectors, implemented on top of UML virtual machines, filter and forward traffic as specified by the policy configuration of the redirector.

The frontend acts like a firewall gating off the Collapsar center. The frontend receives the GRE tunneled packets, extracts the original traffic, and forwards it to the high-interaction honeypots. On the reverse path, the frontend takes the honeypot replies, analyzes them for outgoing attacks, and forwards them via GRE to their corresponding production network. To scale to higher loads, multiple frontends may be used. To prevent the honeypots from contributing to attacks, the frontend uses multiple modules to examine the outgoing traffic. These modules can reduce bandwidth consumption or make attacks inefficient by corrupting critical portions of their payload.

The high-interaction honeypots themselves are implemented on virtual machines and can be fully attacked and compromised by an adversary. To make each honeypot appear authentic, it is configured for the production network that it pretends to belong to. This configuration includes the local gateway, mail servers, and DNS servers. By using virtual machines, multiple honeypots can be hosted on the same physical hardware resulting in more efficient resource utilization. The virtual machines also allow for secure introspection from the host system that can be used for tamper-proof logging, system snapshots, and so on.

As already mentioned earlier in the book, high-interaction honeypots give adversaries unlimited access to operating system running on the honeypot. Once compromised, the honeypot can be used to launch attacks against other targets. Collapsar takes great care to make this more difficult. It includes three assurance modules aimed at mitigating the risk of running the honeypots but that also facilitate attack analysis:

Logging Module: The logging module records how adversaries exploit vulnerabilities and gain access to the machine. It also records which activities an adversary is executing on the honeypot after it was compromised. The logging module has hooks directly inside of the guest OS to record activities that cannot be observed at the network level — for example, when encryption is used network traces are less useful than application layer information from the OS. Storage happens outside of the guest OS and is thus inaccessible to the adversary.
Tarpitting Module: The tarpitting module is responsible for mitigating outgoing attacks. As the name suggests, it slows down outgoing network connections. It does this by both reducing the rate of outgoing TCP-SYN packets used for connection establishment and putting a bandwidth limit on the outbound traffic volume. To render outgoing attacks ineffective, the module also corrupts exploit payloads that match known attack signatures. The tarpitting is based on Snort-Inline and employed in both the redirectors and the frontends.
Correlation Module: The correlation module is a pure analysis system. It is capable of detecting network scanning, ongoing DDoS attacks, worm propagation, and overlay networks such IRC command and control channels. Because the correlation module has simultaneous insight into multiple production networks, the analysis can span individual networks and detect malicious activities that spread over the whole Internet.

As we mentioned earlier, the main motivation for hybrid honeypot systems is increased scalability and performance. Xuxian Jiang and Dongyan Xu show that raw TCP throughput is not much degraded by their solution. When transmitting a 100MB file over a 100 MBit/s network, the UML-based redirectors degrade performance by about 10 to 30 percent, depending on the socket buffer size. Surprisingly, the VMware-based honeypots don't fare quite as well. Their maximum throughput is limited at about 20 MBit/s. To get a better idea of real-world performance, Collapsar would have to be deployed on a larger scale. However, judging by the architecture, there is no reason that Collapsar should not be able to scale to a very large address spaces — for example, running Collapsar on a Class-B network seems entirely feasible.

Although large-scale studies of Collapsar are not available, the University of Purdue has been running a small test bed and reported on its findings. We will relate only a single incident here, but more examples are presented in the research paper. One of their virtual honeypots was running an Apache server (version 1.3.20-16) on RedHat 7.2, using Linux kernel version 2.4.7-10. The particular versions are very important for the attack, as you will see following. This particular version of Apache contained a vulnerability in the parsing of chunked HTTP/1.1 streams. A carefully crafted request could trigger a buffer overflow leading to code execution on the stack. This particular honeypot was first deployed on November 24, 2003, and was compromised ten hours later on November 25, 2003. Due to Collapsar's logging module, all interactions of the intruder with the honeypot were recorded. The intruder used the vulnerability in Apache to run a simple shell on the Linux system using the privileges of the web server. Using the shell access, the intruder downloaded another exploit aimed at a ptrace vulnerability in that version of Linux. Using the exploit, the intruder got root access to the system and installed an SSH backdoor on port 1985. Curiously, the password for the backdoor was rooter. From then on, the intruder used his SSH connection to further interact with the system. This demonstrates one of the strengths of Collapsar, as any network-based solution would not have been able to penetrate the encrypted traffic. The intruder then continued to install an application called ircoffer, which is a file server for IRC that is often used to distribute stolen software or movies. The researchers from Purdue disabled the honeypot once they noticed the installation of ircoffer but made a complete image of the compromised honeypot available at http://www.cs.purdue.edu/homes/jiangx/collapsar/cases/index.html.

By carefully studying the provided image, you can find several other binaries that have been backdoored. Collapsar was also used to detect and analyze several compromises of Windows XP by worms such as MSBlast and Nachi.