With the help of CWSandbox, we are able to automatically generate a report of the behavior of a given malware binary. To measure the accuracy of this report, we analyzed several current malware binaries and compared the result with reports generated by Norman Sandbox [63] and by Symantec via manual code analysis [95].
The results generated by CWSandbox contain more details than the one provided by Norman Sandbox. This is mainly due to the fact that CWSandbox does the analysis on a live system, and therefore the malware binary can interact with other processes — for example, create a remote thread in another process's context. We can thus also observe how the malware process interferes with the rest of the system. The reports generated by both tools are, apart from this, very similar: Changes to the file system, modification of the registry, creation of mutexes, or actions regarding process management are detected by both approaches. There are only small changes — for example, if the malware binary uses a random file name when copying itself to another location, the analysis reports differ in this aspect. This is, however, only marginal. Moreover, Norman Sandbox has the disadvantage that by default, no real Internet connection is available, but the network is also simulated. If the malware process tries to download additional content from a remote location, Norman Sandbox will only detect this, but it cannot automatically analyze the remote file. In contrast to this, CWSandbox also observes the download request, and if, for example, the downloaded file is executed, CWSandbox performs DLL injection to also enable API hooking on the new process.
Compared with the reports from the manual code analysis, the sandbox reports all the important actions, but some small details and behavior variants (e.g., creating of certain event objects) are not detected. This is because the corresponding API calls are not hooked in the current implementation. By adding hooks to these API calls, it is possible to extend the analysis capabilities of CWSandbox. There are no details that are contained in our analysis report that were not reported by Symantec. As the code analysis leads to a complete result, this is not astonishing. Moreover, code analysis can uncover additional behavior of the malware binary. With the sandbox, we only observe one particular path of the malware execution. Thus, we may miss certain aspects that are, for example, triggered by time events.
Since we execute the malware sample for a certain amount of time, we can use this time interval as a mechanism to tune the throughput of CWSandbox. The main question is, how long should the binary be executed to achieve a report as accurate as possible? The best-practice value we found during tests was two minutes. It turned out that after this period of time, the malware binary had enough time to interact with the system — for example, copy itself to another location, spawn new processes, or connect to a remote server.
As an example, we want to present a sample report received via CWSandbox. With the help of nepenthes, we captured a malware binary with the MD5 sum 2ff9766f32f0c1bc3f06b93945a497f6. This file was automatically submitted to CWSandbox for an automated analysis. The resulting report is available in XML format, and we quickly walk through the main sections of such a report.
The beginning of the analysis report is always the same. At first, several pieces of static information about the analysis — for example, the time and the name of the file to be analyzed — are reported. Afterward, the call tree is shown. This corresponds to the logical order in which processes and children were executed. In the following example, we see that the malware creates one additional process and interacts with services.exe, the service manager of Windows. This is very common, but there are also malware binaries that create ten or more processes.
<?xml version="1.0"?> <!-- This analysis was created by CWSandbox (c) Carsten Willems 2006--> <analysis cwsversion="1.107" time="05.02.2007 00:28:38" file="2ff9 766f32f0c1bc3f06b93945a497f6.exe" logpath="C:\analysis\log\ 2ff9766f32f0c1bc3f06b93945a497f6.exe\run_1\"> <calltree> <process_call index="1" pid="888" filename="c:\2ff9766f32f0c1bc3f0 6b93945a497f6.exe" starttime="00:00.188" startreason="AnalysisTarget"><calltree> <process_call index="2" pid="576" filename="C:\WINDOWS\system32\ soundsman.exe C:\WINDOWS\system32\soundsman.exe 1436 c:\2ff9766f 32f0c1bc3f06b93945a497f6.exe" starttime="00:02.235" startreason="CreateProcess"/> </calltree> </process_call> <process_call index="3" pid="664" filename="services.exe" starttime="00:05.891" startreason="SCM"/> </calltree> |
In the following, we switch to the text output of the analysis report. Since the report is in XML format, we are pretty flexible and can transform it with the help of XSL to another format — in this case, a simple text format that can be more easily read by a human.
Following the process call header, the next part reports the observed behavior for each process. The structure is always the same: At the beginning of such a section, several static pieces of information, like filesize, start and termination time, or the MD5 sum, are reported.
The virusscan section then reports the output from several antivirus engines. Currently, three different engines are supported: ClamAV, Bitdefender, and AntiVir Workstation. In the running example, Bitdefender and AntiVir have a generic detection for the malware binary, but ClamAV does not detect it at all.
[ Virusscans ] -------------- * ClamAV: Application version: 0.88.2, Signature file version: 2523 Classification: OK * BDC/Linux-Console: Application version: 7.0.2492, Signature file version 418483 Classification: Generic.Sdbot.BAC4B0C4 * AntiVir Workstation: Application version: 2.1.9-33, Signature file version: 6.37.1.28 Classification: TR/Crypt.XPACK.Gen |
The DLLs loaded by the binary during execution are reported next. These DLLs commonly include kernel32.dll, user32.dll, and all other important Windows DLLs. We can get valuable information from this section. If for example, wsock32.dll or ws2_32.dll are loaded, we can be sure that the binary also has the capability to access the network. The following excerpt of the running example shows the beginning of the DLL section:
Code View: [ Loaded dlls ] --------------- * Loads dll c:\2ff9766f32f0c1bc3f06b93945a497f6.exe from address (729088 bytes). [successful] * Loads dll C:\WINDOWS\system32\ntdll.dll from address (749568 bytes). [successful] * Loads dll C:\WINDOWS\system32\kernel32.dll from address (1073152 bytes). [successful] * Loads dll C:\WINDOWS\system32\user32.dll from address (589824 bytes). [successful] * Loads dll C:\WINDOWS\system32\wsock32.dll from address (40960 bytes). [successful] * Loads dll C:\WINDOWS\system32\WS2_32.dll from address (94208 bytes). [successful]* Loads dll C:\WINDOWS\system32\pstorec.dll from address (53248 bytes). [successful] * Loads dll C:\WINDOWS\system32\Wship6.dll from address (28672 bytes). [successful] [...] |
In the filesystem section, CWSandbox reports all observed changes to the filesystem. This can be, for example, newly created files or deleted files but also checking of file attributes or copying of files. In our example report, we see that the binary first opens several devices related to networking and then copies itself to the Windows system32 folder. Then the attributes of the copied file are changed so that it is hidden, read-only, and some other options are set. This is the actual installation phase of the malware. It copies itself to a known location and then later on will start this copied binary:
[ Changes to filesystem ] ------------------------- * Creates open file \Device\Tcp. * Creates open file \Device\Ip. * Creates open file \Device\Ip. [...] * Copies file c:\2ff9766f32f0c1bc3f06b93945a497f6.exe to C:\WINDOWS\system32\soundsman.exe. * Finds file soundsman.exe. * Gets file attributes C:\WINDOWS\system32\soundsman.exe. * Sets file attributes C:\WINDOWS\system32\soundsman.exe. * Sets file time C:\WINDOWS\system32\soundsman.exe. [...] * Deletes file c:\2ff9766f32f0c1bc3f06b93945a497f6.exe. [...] </filesystem_section> |
A so-called mutex object (mutual exclusion) is a synchronization object under Windows whose state is set to signaled when it is not owned by any thread. On the other hand, it is set to nonsignaled when it is owned by a threat. This mechanism is typically used to prevent several threads from writing to shared objects — like, for example, memory regions — at the same time. Each thread waits for ownership of a mutex object before executing the code that accesses the shared object. After writing to the shared object, the thread releases the mutex object and another thread can access the shared object if needed. Malware also uses this mechanism to synchronize several threads:
[ Mutex section ] ----------------- * Creates mutex n1c05. [not owned] |
An interesting aspect of mutexes is that they sometimes allow us to detect certain variants of a particular bot. If two different binaries use the same name for the mutex object and it is characteristic (e.g., By MeGaByTeS2lk or spybot1.2c), then the changes are high that both samples are just a minor modification of each other.
Next, the analysis report provides us with more information about registry access. We see all opened, queried, deleted, and changed registry keys. This way, we can see what the malware does with the Windows registry keys, and we can track all these modifications:
[ Changes to registry ] ----------------------- * Opens key HKEY_CURRENT_USER "Software\Microsoft\OLE". * Opens key HKEY_LOCAL_MACHINE "Software\Microsoft\Rpc\ SecurityService". * Opens key HKEY_LOCAL_MACHINE "System\CurrentControlSet\Control\ SecurityProviders". * Creates key HKEY_CURRENT_USER "Software\Microsoft\OLE". [...] * Enumerates value HKEY_LOCAL_MACHINE\System\CurrentControlSet\ Control\SecurityProviders\SaslProfiles ". |
The binary under analysis often creates an additional process, starts Windows services, creates remote threads, or performs other actions regarding process management. CWSandbox also analyzes all these functions and reports which new processes were created. Moreover, the tool also tracks all newly created processes and services to also observe their behavior. In the running example, the malware binary creates a new process with certain parameters using the API function CreateProcessA:
[ Process/window information ] ------------------------------ * Creates process 576 as C:\WINDOWS\system32\soundsman.exe 1436 "c:\2ff9766f32f0c1bc3f06b93945a497f6.exe". [succesful] * Kills process 888. * Enumerates running processes. * Enumerates modules 576. |
Since this was the last section for process 1, the next section basically repeats all this information for the next process, which was created by process 1. Again, we see some statistics information about the binary that are then followed by information about loaded DLLs and filesystem activity:
The interesting aspect about the second process is that it creates certain registry keys to survive a reboot. It adds itself to the Run and RunServices sections, and thus Windows executes the malware binary upon the next startup:
[ Changes to registry ] ----------------------- [...] * Sets value HKEY_LOCAL_MACHINE\Software\Microsoft\Windows\ CurrentVersion\Run "Microsoft Sounds" to "soundsman.exe". * Sets value HKEY_LOCAL_MACHINE\Software\Microsoft\Windows\ CurrentVersion\RunServices "Microsoft Sounds" to "soundsman.exe". * Sets value HKEY_CURRENT_USER\Software\Microsoft\OLE "Microsoft Sounds" to "soundsman.exe". [...] |
A very valuable aspect of CWSandbox is its ability to observe the network behavior of malware. It monitors the Winsock (Windows Sockets) information, which are the API functions related to Windows network access. Therefore, we can automatically extract all information that is sent to other machines with the help of these functions. For example, we see when the malware binary does a DNS lookup:
* Gethostbyname request for pepe84.mooo.com returned 201.212.107.195. [successful] |
Finally, we see in the analysis report that the malware binaries connect to an external machine. The sample is a bot, so it connects to the central C&C server, from which it receives further commands. In this example, it uses the user- and nickname nx-680222376 and joins the channel #a2. This channel has the topic :xvvv asn139 200 0 0 -b -r, which means that the bots propagate further and try to find other vulnerable machines:
Code View: [ Network services ] -------------------- * Gethostbyname request for pepe84.mooo.com returned 201.212.107.195. [successful]* Connects to 201.212.107.195 on port 7005 (IRC, TCP). [successful] * Enters channel #a2 (password: ) with nick nx-680222376 (user: nx-680222376, password: ). [rfc conform] |
As you have seen, the analysis report by CWSandbox is a good starting point for malware analysis. We can automatically extract information regarding the filesystem, the registry, network connections, and several other useful information. Based on this data, we can estimate whether we need a more detailed (manual) analysis. Thus, you can enhance the Data Analysis phase after a successful compromise of your Windows honeypots to a certain degree with the help of this tool.
The example report in this section just shows some of the capabilities of CWSandbox. It's best if you test it on your own. Send malware samples captured by your honeypots to CWSandbox and take a look at the analysis reports yourself!
We did a larger test to evaluate the throughput and the quality of the reports generated by CWSandbox. For this, we analyzed 6148 malware binaries we collected with the help of nepenthes, a honeypot tool we introduced in Chapter 6. We had collected these malware binaries in a five-month period between June and October 2006 while running nepenthes on about 16,000 IP addresses. This test corpus is thus real malware, spreading in the wild. We can be sure that all of these binaries are malicious, since we downloaded them after a successful exploitation attempt.
The antivirus engine ClamAV classified these samples as 1572 different kinds of malware. Most of them were different variants of bots — for example, rather than many different Poebot or Padobot variants. Only 3863 of the 6148 samples were classified as malicious by ClamAV, most likely due to the fact that no signature for the other binaries was available. As just stated, all samples are malicious due to the collection method, and thus an antivirus engine should classify 100 percent as malicious. In this case, only 62.8 percent are detected.
For the analysis process we used the following configuration: We executed CWSandbox on two commercial off-the-shelf systems with an Intel Pentium IV processor running with 2GHz and 2GB of RAM. Each of these systems was running Debian Linux Testing and had two virtual machines based on VMware Server and Windows XP as guest systems. Within the virtual machines CWSandbox was executed, so four instances were effectively running in parallel. The malware binaries were stored in a MySQL database, and all reports were also written to this database.
CWSandbox was able to analyze all these binaries in about 67 hours, thus the effective throughput was more than 500 binaries per day per instance. This is at least an order of magnitude faster than an analysis by a human. The resulting report can be used by a human analysts to get a first overview of the binary and only if it is necessary, the human must analyze it further. You can, for example, submit a sample that you collected with your honeypots to http://www.cwsandbox.org/. A few minutes later, you should receive the analysis report in your inbox via e-mail. Based on this report, you can estimate whether a manual analysis is necessary or whether you have already collected enough information.
We also want to present some statistics regarding the analysis reports of CWSandbox to give an overview of the variety of results. Over 324 binaries contacted an IRC server, a clear sign that these malware binaries tried to contact the central server that is used for C&C within botnets. It turned out that 172 of these botnets were unique. Since we extract information like the IRC channel or passwords used to access the C&C server from the samples, the analysis can help to mitigate the risk posed by botnets. With the help of nepenthes and CWSandbox, we now have a framework to detect the presence of botnets without any human intervention. Nepenthes collects autonomous spreading malware and CWSandbox analyzes them, and if a botnet is detected, the system can inform a human analyst. The complete process is automated, and we are working on an even better automation to handle the detected botnets.
Of the 6148 samples 856 contacted an HTTP server and tried to download further data from the Internet. Since we also observe how this data is handled, we can also learn more about these additional infection stages. The resulting actions ranged from download of additional executable code, C&C for HTTP-based botnets, to click fraud (i.e., automated visits to certain web pages). In addition, two samples used FTP to download further data from the Internet during the analysis process.
We also observed a couple of malware samples (78 binaries) that tried to use SMTP as communication protocol. Most often, this is used to send out spam e-mails or to send information about the compromised machine back to the attacker. The behavior-based approach behind CWSandbox can also detect this kind of malicious actions, and the appropriate countermeasures can be developed. For SMTP, we record the destination e-mail and the body of the message, so we get complete information about what the malware wants to do.
For malware binaries it is quite common that they add a registry key to enable an autostart mechanism. More than 95 percent of our samples created such a registry key. Moreover, mutexes are quite common to be sure that only one instance of the malware binary is running on a compromised host. A third very often observed pattern is that the malware binary copies itself to the Windows system folder. Similar characteristics can hopefully aid in the future to also automatically define suspect behavior. CWSandbox could be extended to also automatically classify a binary as either normal or malicious based on the observed behavior.