ch07lev1sec5.html

7.5. Building Your Own Hybrid Honeypot System

Although a lot of research has been conducted into scaling up the operations of honeynets, there are no easily usable solutions available on the Internet today. The problem of scaling up honeynet operations are really only encountered by people who have a lot of address space available. For those who have less than a /24 network, performance is not usually a problem. Nonetheless, we will provide some potential approaches that you might be able to use to create a more scalable system yourself.

7.5.1. NAT and High-Interaction Honeypots

The simplest way to serve a larger address space is to use network address translation (NAT). In this scenario, we have a router forward a network to a NAT device, which is connected to a small number of high-interaction honeypots on the other side. Ideally, as already discussed in Chapter 2 , the high-interaction honeypots run on something like VMware so they can be easily reverted to a clean state after they got compromised.

Let's say we have access to C-class network 10.1.1/24 and run a few high-interaction honeypots on another network, 192.168.2/28. We need to create a configuration for the NAT device to map the C-class network to our honeypots. In this example, we use an OpenBSD machine as our NAT device running the popular pf firewall. As in many of these cases, we don't really need to write the firewall configuration by hand. Instead, we opt to use a small Python script:

Code View:
#!/usr/bin/env python intf = "xl0" # external interface on which we get traffic big_network = "10.1.1.1" # external network routed to us big_hosts = 254 # number of addresses on external network small_network = "192.168.2.1" # internal network of honeypots small_hosts = 10 # number of high interaction honeypots # dense function to convert a number into an ip address toip = lambda num: '.'.join(map(lambda x: str((num / 256 ** (3 - x)) % 256), range(4))) # less dense function to convert an ip address into a number tonum = lambda ip: reduce(lambda x, y: 256 * int(x) + int(y), ip.split('.')) big_num = tonum(big_network) small_num = tonum(small_network) for off in range(big_hosts): src_ip = toip(big_num + off) dst_ip = toip(small_num + (off % small_hosts)) print 'rdr on %s proto {tcp, udp} from any to %s/32 port 1:65535 -> %s port 1:*' % (intf, src_ip, dst_ip)

Code View: #!/usr/bin/env python intf = "xl0" # external interface on which we get traffic big_network = "10.1.1.1" # external network routed to us big_hosts = 254 # number of addresses on external network small_network = "192.168.2.1" # internal network of honeypots small_hosts = 10 # number of high interaction honeypots # dense function to convert a number into an ip address toip = lambda num: '.'.join(map(lambda x: str((num / 256 ** (3 - x)) % 256), range(4))) # less dense function to convert an ip address into a number tonum = lambda ip: reduce(lambda x, y: 256 * int(x) + int(y), ip.split('.')) big_num = tonum(big_network) small_num = tonum(small_network) for off in range(big_hosts): src_ip = toip(big_num + off) dst_ip = toip(small_num + (off % small_hosts)) print 'rdr on %s proto {tcp, udp} from any to %s/32 port 1:65535 -> %s port 1:*' % (intf, src_ip, dst_ip)

This little Python script generates a NAT rule for each IP address on the network that we can to expose to the honeypots. The NAT rule picks the next available internal honeypot IP address. Assuming that most attacks and scans are randomly distributed across our address space, the mapping generated between IP addresses and honeypots should effectively balance the load among them. The resulting configuration file looks as follows:

rdr on xl0 proto {tcp, udp} from any to 10.1.1.1/32 port 1:65535 -> 192.168.2.1 port 1:* rdr on xl0 proto {tcp, udp} from any to 10.1.1.2/32 port 1:65535 -> 192.168.2.2 port 1:* rdr on xl0 proto {tcp, udp} from any to 10.1.1.3/32 port 1:65535 -> 192.168.2.3 port 1:* rdr on xl0 proto {tcp, udp} from any to 10.1.1.4/32 port 1:65535 -> 192.168.2.4 port 1:* rdr on xl0 proto {tcp, udp} from any to 10.1.1.5/32 port 1:65535 -> 192.168.2.5 port 1:* rdr on xl0 proto {tcp, udp} from any to 10.1.1.6/32 port 1:65535 -> 192.168.2.6 port 1:* ...

rdr on xl0 proto {tcp, udp} from any to 10.1.1.1/32 port 1:65535 -> 192.168.2.1 port 1:* rdr on xl0 proto {tcp, udp} from any to 10.1.1.2/32 port 1:65535 -> 192.168.2.2 port 1:* rdr on xl0 proto {tcp, udp} from any to 10.1.1.3/32 port 1:65535 -> 192.168.2.3 port 1:* rdr on xl0 proto {tcp, udp} from any to 10.1.1.4/32 port 1:65535 -> 192.168.2.4 port 1:* rdr on xl0 proto {tcp, udp} from any to 10.1.1.5/32 port 1:65535 -> 192.168.2.5 port 1:* rdr on xl0 proto {tcp, udp} from any to 10.1.1.6/32 port 1:65535 -> 192.168.2.6 port 1:* ...

Clearly, it is possible to create more sophisticated mappings. If the high-interaction honeypots have different configurations, you might want to map ports individually or give a higher likelihood that certain honeypots receive more traffic, and so on. As with all installations of high-interaction honeypots, you need to make sure that your honeypots cannot cause damage outside of your network. You may use Honeywall or Snort Inline for that; see Section 2.5.1 for more information on how to use them.

A similar approach has been taken by Vinod Yegneswaran et al., researchers from the University of Wisconsin, to provide scalable abuse monitoring for their university network [111 ]. Their iSink system adds several unique features on top of our somewhat simple approach. Instead of only using a NAT gateway to redistributed traffic, iSink uses the NAT gateway as a filter for known attacks. Only unknown traffic is passed through. Figure 7.4 gives an overview of the iSink system. In addition to a honeynet with high-interaction honeypots, iSink adds another component, called the Active Sink. This design is able to respond to traffic for the 16 million addresses available via a class A network. The Active Sink is based on the modular Click router system, a software routing system that can be extended easily by adding new packet processors. Active Sink achieves its high performance by operating completely without state. Active Sink's design is based on the following assumptions:

Knowing about the different Internet protocols, it's almost always possible to create a suitable response packet just by looking at the request.
A packet exchange needs to be continued only until the payload contains a worm or virus that can be identified by content analysis.

Figure 7.4. Wisconsin's Internet Sink (iSink) uses a NAT gateway to filter known attacks. In addition to traditional high-interaction honeypots, iSink also features an Active Sink component that is a high-performance stateless responder.

[View full size image]

Active Sink contains a responder for HTTP requests and also for more Windows-specific protocols like NetBIOS, SMB, CIFS, and DCE/RPC. Special support for backdoor ports left by MyDoom and Beagle has been integrated as well.

In our preceding experimental configuration, the NAT gateway forwards all connections on to our high-interaction honeypots. In iSink, the NAT gateway heavily filters requests and tries to forward only interesting payloads. To reduce traffic, iSink applies one of the following three strategies to each source IP address:

First N connections
First N connections per destination port
Connections to the first N destination IPs

In their experiments, the middle strategy did not provide as good a performance as the other strategies that provided a reduction of two orders of magnitude in both packets and bandwidth. The last strategy was chosen over the first one because it provides a more consistent view of the network to adversaries. Rather than being stopped completely after having made N connections, the last strategy allows an adversary to continue talking to hosts she has already talked to.

In addition to Wisconsin's campus network, iSink has been deployed to a class A network that receives traffic for over 16 million IP addresses. The class A network was advertised via BGP to the world. To measure potential packet loss, the SNMP enabled switch was monitored to see if the system could handle the bandwidth. During the deployment, the system handled about 5000 packets per second with about 6 MB/s bandwidth — most of which was used by UDP traffic. Even though Active Sink does not provide high-interaction capabilities, the provided responders for Windows protocols were able to detect new Worm outbreaks such as Sasser. As such, it provided similar capabilities to nepenthes, discussed in Chapter 6 , without being specifically designed for that task.

7.5.2. Honeyd and High-Interaction Honeypot

As we have seen in the previous section, being able to filter — for example, at the NAT gateway — can provide significant offloading for the high-interaction honeypots. Without access to the systems presented in the research paper and with no desire to hack the kernel of your operating system, there is another option that might allow us to use some of these techniques. The Honeyd low-interaction honeypot system provides fairly reasonable performance (see Chapter 4 ). Honeyd is not a high-interaction honeypot, and even with the best service scripts, it cannot be compromised like a real machine. However, Honeyd provides various ways to configure a large number of IP addresses and even to selectively forward the traffic for some of them.

We use a similar address space as in the NAT example. This requires that your Honeyd machine have either of two interfaces: one for each network or so that your operating system supports IP aliases allowing multiple networks to be assigned to the same interface. In any case, we are using a trusty Python script again to provide the proper configuration.

Code View:
#!/usr/bin/env python intf = "xl0" big_network = "10.1.1.1" big_hosts = 254 small_network = "192.168.2.1" small_hosts = 10 template = '''create honeypot-%(smallip)s set honeypot-%(smallip)s default tcp action proxy %(smallip)s:$dport set honeypot-%(smallip)s default udp action proxy %(smallip)s:$dport''' toip = lambda num: '.'.join( map(lambda x: str((num / 256 ** (3 - x)) % 256), range(4))) tonum = lambda ip: reduce( lambda x, y: 256 * int(x) + int(y), ip.split('.')) small_num = tonum(small_network) for off in range(small_hosts): mydict = { 'smallip' : toip(small_num + off) } print template % mydict big_num = tonum(big_network) for off in range(big_hosts): src_ip = toip(big_num + off) dst_ip = toip(small_num + (off % small_hosts)) print 'bind %s honeypot-%s'% (src_ip, dst_ip)

Code View: #!/usr/bin/env python intf = "xl0" big_network = "10.1.1.1" big_hosts = 254 small_network = "192.168.2.1" small_hosts = 10 template = '''create honeypot-%(smallip)s set honeypot-%(smallip)s default tcp action proxy %(smallip)s:$dport set honeypot-%(smallip)s default udp action proxy %(smallip)s:$dport''' toip = lambda num: '.'.join( map(lambda x: str((num / 256 ** (3 - x)) % 256), range(4))) tonum = lambda ip: reduce( lambda x, y: 256 * int(x) + int(y), ip.split('.')) small_num = tonum(small_network) for off in range(small_hosts): mydict = { 'smallip' : toip(small_num + off) } print template % mydict big_num = tonum(big_network) for off in range(big_hosts): src_ip = toip(big_num + off) dst_ip = toip(small_num + (off % small_hosts)) print 'bind %s honeypot-%s'% (src_ip, dst_ip)

The script creates a template for each high-interaction honeypot in our little honeynet. The template does not contain much information at all. It only directs the virtual honeypot to forward all of their UDP and TCP traffic to the corresponding high-interaction honeypot on 192.168.2/28. We then bind each template to an external IP address in a round-robin fashion to provide similar load balancing to the high-interaction honeypots as before. The resulting configuration looks similar to this one:

create honeypot-192.168.2.1 set honeypot-192.168.2.1 default tcp action proxy 192.168.2.1:$dport set honeypot-192.168.2.1 default udp action proxy 192.168.2.1:$dport create honeypot-192.168.2.2 set honeypot-192.168.2.2 default tcp action proxy 192.168.2.2:$dport set honeypot-192.168.2.2 default udp action proxy 192.168.2.2:$dport ... bind 10.1.1.1 honeypot-192.168.2.1 bind 10.1.1.2 honeypot-192.168.2.2 bind 10.1.1.3 honeypot-192.168.2.3 bind 10.1.1.4 honeypot-192.168.2.4 bind 10.1.1.5 honeypot-192.168.2.5 bind 10.1.1.6 honeypot-192.168.2.6 ...

create honeypot-192.168.2.1 set honeypot-192.168.2.1 default tcp action proxy 192.168.2.1:$dport set honeypot-192.168.2.1 default udp action proxy 192.168.2.1:$dport create honeypot-192.168.2.2 set honeypot-192.168.2.2 default tcp action proxy 192.168.2.2:$dport set honeypot-192.168.2.2 default udp action proxy 192.168.2.2:$dport ... bind 10.1.1.1 honeypot-192.168.2.1 bind 10.1.1.2 honeypot-192.168.2.2 bind 10.1.1.3 honeypot-192.168.2.3 bind 10.1.1.4 honeypot-192.168.2.4 bind 10.1.1.5 honeypot-192.168.2.5 bind 10.1.1.6 honeypot-192.168.2.6 ...

Unfortunately, this does not quite replicate the NAT behavior. Honeyd will make proxy TCP connections from its own IP address, so some exploits might not work correctly. Honeyd also does not provide any of the filtering strategies supported by iSink. However, as Honeyd is open source, implementing these filtering strategies is a possibility. Honeyd's dynamic templates already provide a mechanism for conditional handling of connections. For example, we could have a template that blocks everything

create allblock set allblock default tcp action block set allblock default udp action block set allblock default icmp action block

create allblock set allblock default tcp action block set allblock default udp action block set allblock default icmp action block

and then use dynamic templates to forward, only if the source IP address matches the filtering strategy:

dynamic filter-192.168.2.1 add filter-192.168.2.1 use honeypot-192.168.2.1 if source ip = <something> add filter-192.168.2.1 otherwise use allblock

dynamic filter-192.168.2.1 add filter-192.168.2.1 use honeypot-192.168.2.1 if source ip = <something> add filter-192.168.2.1 otherwise use allblock

In this particular example, an adversary could reach the honeypot at 192.168.2.1 only if his source IP matched the IP address provided in the preceding configuration. The matching code is implemented in Honeyd's condition.c source file. It would be relatively straightforward to implement additional filter logic there. This is left as an exercise for the reader.^[4]

^[4] Adding this additional functionality to Honeyd requires a pretty decent understanding of C and being able to read and understand third-party source code, but it's not as difficult as it may sound.