12.1 CWSandbox Overview
12.2 Behavior-Based Malware Analysis
12.3 CWSandbox — System Description
12.4 Results
12.5 Summary
In the old days of honeypots (back in the year 2000), most of the activity a honeypot captured was manual activity. Attackers would actually get on the system, type in keystrokes, install rootkits, and abuse the honeypot in different ways. Nowadays, most attacks are automated to improve efficiency and return on investment for an attacker. This automation mostly happens with the help of malware. Quite often you will capture automated threats with your honeypot. For example, a honeypot running an unpatched version of Windows will most likely be compromised within a couple of minutes. In the previous chapters we have seen several examples of such automated threats — most prominent bots and other kinds of autonomous spreading malware. As a result, to leverage honeypots you need to understand how to analyze malware. In this chapter we introduce one possible way to learn about malware. We introduce the concept of behavior analysis and show how such a system can be implemented based on a tool called CWSandbox.
Malware is notoriously difficult to combat. Usually, security products such as virus scanners look for characteristic byte sequences (signatures) to identify malicious code. However, malware has become more and more adept to avoid detection by changing its appearance — for example, in the form of poly- or meta-morphic worms. The rate at which new malware appears on the Internet is also still very high. Furthermore, flash worms [90] pose a novel threat in that they stealthily perform reconaissance for vulnerable machines for a long time without infecting them, and then all of a sudden pursue a strategic and coordinated spreading plan by infecting a large number of vulnerable machines within seconds.
In the face of such automated threats, we cannot combat malicious software using traditional methods of decompilation and reverse engineering by hand. Automated malware must be analyzed (1) automatically, (2) effectively, and (3) correctly. Automation means that the analysis tool should create a detailed analysis report of a malware sample quickly and without user intervention. A machine readable report can in turn be used to initiate automated response procedures like updating signatures in an intrusion detection system, thus protecting networks from new malware samples on the fly. Effectiveness of a tool means that all relevant behavior of the malware should be logged; no executed functionality of the malware should be overlooked. This is important to realistically assess the threat posed by the malware sample. Finally, a tool should produce a correct analysis of the malware — that is, every logged action should in fact have been initiated by the malware sample to avoid false claims about it.
In this chapter we introduce a tool called CWSandbox that can be used to analyze the malware you collect with your honeypots. CWSandbox executes the malware in a controlled environment and observes what the malware is doing. Based on these observations, you receive an analysis report that is a good starting point for vulnerability assignment. This is similar to a honeypot: We execute the binary in an instrumented environment and do some data control and data analysis to learn more about this threat.
CWSandbox is a tool for malware analysis that fulfills the three design criteria of automation, effectiveness, and correctness for the Win32 family of operating systems.
Automation is achieved by performing a dynamic analysis of the malware. This means that malware is analyzed by executing it within a simulated environment (sandbox), which works for any type of malware in almost all circumstances. A drawback of dynamic analysis is that it only analyzes a single execution of the malware. This is in contrast to static analysis, in which the source code is analyzed, thereby allowing observation of all executions of the malware at once. Static analysis of malware, however, is rather difficult, since the source code is commonly not available. Even if the source code were available, one could never be sure that no modifications of the binary executable happened that were not documented by the source. Static analysis at the machine code level is often extremely cumbersome, since malware often uses code-obfuscation techniques like compression, encryption, or self-modification to evade decompilation and analysis.
Effectiveness is achieved by using the technique of API hooking. API hooking means that calls to the Windows application programmers' interface (API) are rerouted to the monitoring software before the actual API code is called, thereby creating insight into the sequence of system operations performed by the malware sample. API hooking ensures that all those aspects of the malware behavior are monitored for which the API calls are hooked. API hooking therefore guarantees that system level behavior (which at some point in time must use an API call) is not overlooked unless the corresponding API call is not hooked.
API hooking can be bypassed by programs that directly call kernel code to avoid using the Windows API. However, this is rather uncommon in malware, as the malware author needs to know the target operating system, its service pack level and some other information in advance. Our empirical results show that most autonomous spreading malware is designed to attack a large user base and thus commonly uses the Windows API.
Correctness of the tool is achieved through the technique of DLL code injection. Roughly speaking, DLL code injection allows API hooking to be implemented in a modular and reusable way, thereby raising confidence in the implementation and the correctness of the reported analysis results.
The combination of these three techniques within CWSandbox allows to trace and monitor all relevant system calls and generate an automated, machine-readable report that describes the following:
Which changes the malware sample performed on the Windows registry
Which dynamic link libraries (DLLs) were loaded before executing
Which virtual memory areas were accessed
Which network connections were opened and what information was sent over such connections
Obviously, the reporting features of the CWSandbox cannot be perfect — that is, they can only report on the visible behavior of the malware and not on how the malware is programmed. Using the CWSandbox also entails some danger that arises from executing dangerous malware on a machine that is connected to a network. However, the information derived from executing malware for even very short periods of time in the CWSandbox is surprisingly rich and in most cases sufficient to assess the danger originating from the malware.
CWSandbox was developed by Carsten Willems as part of his thesis and Ph.D. studies. You can access a web frontend to the tool at http://www.cwsandbox.org: simply submit a binary at that website, and a couple of minutes later you will receive an analysis report via e-mail.