6.1 A Primer on Malicious Software
6.2 Nepenthes — A Honeypot Solution to Collect Malware
6.3 Honeytrap
6.4 Other Honeypot Solutions for Learning About Malware
6.5 Summary
Software programs that serve malicious purposes are usually called malware, from malicious software. Most destructive is the type of malware that spreads automatically over the network from machine to machine by exploiting known or unknown vulnerabilities. Such malware is not only a constant threat to the integrity of individual computers on the Internet. In the form of botnets, for example, that can bring down almost any server through Distributed Denial of Service (DDoS), the combined power of many compromised machines is a constant danger even to uninfected sites.
Collecting malware in the wild and analyzing it is not easy. In practice, much malware is collected and analyzed by detailed forensic examinations of infected machines. The actual malware needs to be dissected by hand from the compromised machine. With the increasing birthrate of new malware, this can only be done for a small proportion of system compromises. Also, sophisticated worms and viruses spread so fast today that hand-controlled human intervention is almost always too late. In both cases we need a very high degree of automation to handle these issues.
In this chapter, we describe an approach to collect malware with the help of honeypots. Why would we want to collect malware? There are several reasons, and they almost all are based on the tenet "Know your enemy." First, investigating individual pieces of malware gives you more defenses against these and similar artifacts. For example, intrusion detection and antivirus systems can refine their list of signatures against which files and network traffic are matched. In general, the more we know about what malware is spreading, the better we can organize our defenses. Second, we can use the underlying technique to protect a network. The collection system can be used as another building block of an intrusion detection system. In Section 10.1 we will present an example of how such an intrusion detection system based on honeypots can be implemented. The third reason we should collect malware is that if we do so in a large scale, we can generate statistics to learn more about attack patterns, attack trends, and attack rates of malicious network traffic today, based on live and authentic data. Some statistics we have collected by running different kinds of honeypots are presented at the end of this chapter.
Imagine that you have to reinstall the operating system on your computer, say because the system seems to be running slower and slower. Another reason could be that your antivirus program detects a piece of malware on your machine or your Internet Service Provider (ISP) sends you an e-mail reporting that somebody complained about your behavior on the Internet (abuse report).
Imagine further that you have only a CD-ROM with Windows 2000 or Windows XP without any service pack at hand, since this CD-ROM was included when you bought the computer. So you install this operating system on your PC. As a responsible user with some security background, you know that you have to patch your computer to be safe. To retrieve these updates, you connect your computer to the Internet and instantly download the latest service pack and monthly updates to patch all vulnerabilities. These patches are often several megabytes large; in some cases, service packs of even more than 100 megabytes are not uncommon. Even if you have a fast broadband connection to the Internet, the download process will take a couple of minutes.
Presumably you will notice during the downloading process that your computer is behaving strangely. It might, for example, show you an error message that a certain service must be restarted. Or you may notice that a web page that you did not open pops up. Another sign could be that the downloading process becomes slower, and slower, since another process takes bandwith and you do not know what this process is supposed to do.
One explanation for this behavior is that while you were downloading the patches, your operating system was extremely vulnerable to attacks, since it contained many security holes. And since autonomous spreading malware constantly tries to propagate further within the Internet, the chances are high that your computer will be probed for common vulnerabilities. Since your system is vulnerable, you will be infected within a short amount of time! Our empirical results with several virtual honeypots show that the average time to compromise on an unpatched system running Windows XP as the operating system is less then ten minutes. We even had some machines that were compromised within less than a minute. Only a few seconds after we plugged the network cable into the honeypot system, it was compromised by a species of autonomous propagating malware. In another experiment, we connected a honeypot based on an unpatched Windows 2000 system with the Internet. After 24 hours, we found 19 different kinds of malware on the system. In addition, we noticed that the honeypot had to reboot quite often. These reboots were, for example, caused by exploitation attempts with a wrong offset. This means that the exploit targeted a system running a different version of the operating system and caused an error in the execution flow. As a result, the exploit can cause a certain process to crash, which then causes the system to reboot.
Our question now is, can we use the idea behind honeypots to learn more about this autonomous spreading malware? Can we perhaps develop a tool to automatically collect the binaries causing this threat? After all, a honeypot is a system designed to be probed, attacked, and compromised. Certainly we can use this methodology to also learn more about malware — but how do we do this effectively and efficiently? In the remainder of this chapter, we will answer these questions and present two tools to automatically collect malware with the help of honeypots: nepenthes and honeytrap.
The collected malware is typically a bot, worm, or other kind of software that tries to propagate further by exploiting well-known vulnerabilities on machines running Windows as the operating system. We also present some empirical results to show you what kind of results you can achieve when running these kinds of virtual honeypots.