1.1 Brief TCP/IP Introduction
1.2 Honeypot Background
1.3 Tools of the Trade
This chapter provides a brief background on Internet protocols. We describe the most important elements like TCP (Transmission Control Protocol) and IP (Internet Protocol) routing. Some link layer information about ARP (Address Resolution Protocol) is necessary to understand how packets reach the end host. Furthermore, we also introduce the basic concept of honeypots. We present all the basic notions of different honeypot solutions and give some brief background, respective advantages, and drawbacks. This chapter can be skipped by people who already know the basics.
The so-called Internet protocol suite is the collection of communications protocols that implements the protocol stack on which the whole Internet runs. It is named after the two most important protocols: TCP and IP. In the following, we give a brief overview of these and related protocols. This should be enough to provide the basic concepts of networking and help you to understand the network aspects we introduce in later chapters. If necessary, we will introduce other concepts of networking on appropriate positions throughout the book. For an in-depth overview of the Internet protocol suite, please take a look at one of the referenced books [8,92,97]. These books give you a detailed overview of all aspects of networking and also focus on practical implementations of them.
The Internet protocol suite can be viewed as a set of different layers. Each layer is responsible for a specific task, and by combining the individual layers, the whole network communication can take place. Each layer has a well-defined interface to the upper and lower level that specifies which data are expected. In total, the Internet protocol suite consists of five different layers:
Application Layer: This layer contains all the protocols that applications use to implement a service. Important examples are HTTP used in the World Wide Web, FTP for file transfer, POP3 and IMAP for receiving e-mails, SMTP for sending e-mails, and SSH for remote login. As a user, you normally interact with the application layer.
Transport Layer: This responds to service requests from the application layer and issues service requests to the network layer. The transport layer provides transparent data transfer between two hosts. Usually it is used for end-to-end connection, flow control, or error recovery. The two most important protocols at this layer are TCP and UDP (User Datagram Protocol). We will examine both briefly later.
Network Layer: The network layer provides end-to-end packet delivery; in other words, it is responsible for packets being sent from the source to the destination. This task includes, for example, network routing, error control, and IP addressing. Important protocols include IP (version 4 and version 6), ICMP (Internet Control Message Protocol), and IPSec (Internet Protocol Security).
Data Link Layer: The link layer is necessary to bridge the last hop from the router to the end host. It is responsible for data transfer between nodes on the same local network and also for data transfer between neighboring nodes in wide area networks. It includes protocols like ARP, ATM (Asynchronous Transfer Mode), and Ethernet.
Physical Layer: The lowest layer in the Internet protocol suite is the physical layer that specifies how raw bits are sent between two connected network nodes. The protocol specifies, among others, how the bitstream should be encoded and how the bits should be converted to a physical signal. Some protocols and techniques at this layer are ISDN (Integrated Services Digital Network), Wi-Fi (wireless LAN), and modems.
IP, ARP, UDP, and TCP are the essential protocols you should know, and we give a brief overview of these four protocols.
IPv4[1] is a data-oriented protocol that is designed to be used on a packet-switched network (e.g., Ethernet). It is a best effort protocol. This means that it does not guarantee that an IP packet sent by one host is also received at the destination host. Furthermore, it does not guarantee that the IP packet is correctly received at the destination: A packet could be received out of order or not received at all. These problems are addressed by transport layer protocols. TCP especially implements several mechanisms to guarantee a reliable data transfer on top of IP. IP implements an addressing scheme via so-called IP addresses. Each host in the Internet has an IP address that you can think of as an address under which the host is reachable. Normally, an IP address is given in a dot-decimal notation — that is, four octets in decimal separated by periods: 192.0.2.1. Via this IP address, other hosts in the network can reach this host. In addition, IP implements the concept of fragmentation: Since different types of networks could have different maximum amounts of data they can send in one packet, it could be necessary to break up a given packet into several smaller ones. This is what IP fragmentation does, and because the end host has to combine the different fragments again, IP reassembly is also necessary.
[1] We simply focus on IPv4 for two main reasons: IPv6 is not yet widely deployed, and there are almost no honeypot solutions available for IPv6 networks.
The MAC address is the physical address of your network adapter. You can determine the IP and MAC address of your network adapter on Unix systems with the command /sbin/ifconfig and on Windows systems with the command ipconfig/ all. Since the data link layer usually implements its own addressing scheme, it is necessary to have a protocol that maps a given IP address (network layer) to a MAC address (data link layer). If your computer wants to communicate with another computer on the local network, it can only use the data link layer because on the remote side, the network adapter of the remote machine just listens for network packets with a destination of its MAC address. The data link layer receives an IP packet from the network layer and only knows the destination IP. Thus, it has to find out which MAC address belongs to the given IP. This is exactly what ARP does: It resolves an IP address to a MAC address. The following example shows what such a protocol dialog might look like. The host with the IP address 10.0.1.6 wants to send a packet to the host with the IP 10.0.1.91. Since the sender does not know the physical address of the destination, it issues a broadcast — that is, it sends a request to all hosts in the network. The host with the IP address 10.0.1.91 picks up this request and sends an answer back to the sender that contains its MAC address:
19:34:35.54 arp who-has 10.0.1.91 tell 10.0.1.6 19:34:35.54 arp reply 10.0.1.91 is-at 00:90:27:a0:77:9b
For now, we know that ARP is used to map an IP to a physical address and that IP is responsible for routing IP packets from the source to the destination. ARP allows us to redirect traffic transparently from one host to the other. It also allows a single host to receive traffic for many different IP addresses. In later chapters, we will talk about Honeyd and how it uses ARP to create hundreds of virtual honeypots on a network.
Now we take a quick look at the two most important transport layer protocols: UDP and TCP. Using UDP, two applications running on different computers but connected via a network can exchange messages, usually known as datagrams (using so-called Datagram Sockets). UDP is one layer "above" IP, and it is stateless — that is, the sending host retains no state on UDP messages once sent. It is a very simple network protocol with almost no overhead. Basically, UDP only provides application multiplexing (i.e., it distinguishes data for multiple connections by concurrent applications running on the same computer) and checksumming of the header and payload. The main drawback of UDP is that it does not provide any reliability and ordering of datagrams. Datagrams may arrive out of order, appear duplicated, or even not arrive at the destination at all. It does not deal with packet loss or packet reordering directly. Without the overhead of checking whether every packet actually arrived, UDP is usually faster and more efficient for many lightweight or time-sensitive purposes. Therefore, this protocol is commonly used for applications like streaming media (Voice over IP or video chats) and online games for which the loss of some datagrams is not critical. Another important use case for UDP is the Domain Name System (DNS), which is used to resolve a given URL to an IP address.
TCP, on the other hand, is connection oriented and provides a multiplexed, reliable communication channel between two network hosts via so-called data streams. TCP guarantees reliable and in-order delivery of data from sender to receiver, as opposed to UDP, which does not guarantee any of these properties. TCP receives a stream of bytes from the application layer, which it divides into appropriately sized segments. These segments are then handed over to the network layer (usually IP), which then takes care of processing them further. TCP checks to make sure that no packets are lost by giving each packet a sequence number. Later, we will take a closer look at sequence numbers when we discuss how a TCP session is established. The TCP implementation running on the receiving host sends back an acknowledgment for all packets that have been successfully received. Together with the sequence number, this acknowledgment number is used to check whether all packets are received, and they can be reordered if necessary. A timer at the sending host will cause a timeout if an acknowledgment is not received within a reasonable amount of time. Based on this information, lost packets are retransmitted, if necessary. In addition, TCP uses a checksum to control whether a given segment is received correctly. Furthermore, TCP implements congestion control to achieve high performance and avoid congestion of the network link. As you can see, TCP is rather complex, but it has many advantages compared to UDP. TCP is usually used if a reliable network communication between two hosts is required. For example, this is necessary for application protocols like HTTP used in the World Wide Web, SMTP and POP3/IMAP for e-mail-related applications, and FTP for data transfers.
In the following, we introduce the packet headers and explain how TCP connections are established, but we will not go into too much detail. You can find many books that focus on TCP/IP networking, all relevant protocols, and how these protocols interact with each other [8,92,97].
Figure 1.1 shows the layout of the TCP header. This is a simplified version, but it contains enough details to understand the main aspects of TCP. In the beginning, we have two 16-bit fields that specify the source and destination port. Ports are used for multiplexing at the transport layer; via network ports, it is possible that different applications listen on just one IP address. For example, a web server typically listens on TCP port 80, and an SMTP server uses TCP port 25. Both servers "share" the IP address of the host via this multiplexing. The next two fields of the TCP header contain the 32-bit sequence and acknowledgment number. The sequence number has two important roles, since it is first used to set the initial sequence number during the connection setup. If the connection is established, the first data byte in the payload is the sequence number. The acknowledgment number specifies — if the ACK flag is set — the sequence number the sender expects next.
The header length field specifies the length of the TCP header in 32-bit words. The minimum size is 5 words and the maximum size 15 words. Furthermore, this field specifies the offset from the start of the TCP packet to the data. The next six bits are reserved for future use in case TCP needs to be extended. Two of these reserved bits are already in use by latest TCP stacks, but for our brief introduction we skip this for simplicity's sake. More important for TCP are the next six bytes: the so-called TCP flags. They are used to provide information about the state of the current TCP packet:
SYN: Used to synchronize sequence numbers during connection setup
ACK: Signals whether the acknowledgment number is significant. If the bit is set, the acknowledgment number is the sequence number the sender expects next.
FIN: Used to tear down a connection, so no more data are sent
URG: Indicates whether the urgent pointer field is significant
The window size is used to specify the number of bytes the sender is willing to receive starting from the acknowledgment field value. The TCP header also contains a checksum that is used to check whether the packet arrived unmodified at the destination. The urgent point is only used if the URG flag is set. It then specifies the offset from the sequence and points to the TCP payload, where the data that should be immediately handed over to the application layer begins. There are other optional fields in the TCP header that we will not discuss for now, since they are mostly not relevant for honeypot deployments.
Since TCP establishes a connection between the sender and the receiver, it needs to set up a connection at the beginning of the communication. This is achieved with the help of the so-called TCP handshake. This handshake is used to synchronize the state between the two hosts, mainly by exchanging the sequence and acknowledgment numbers. These numbers are then used later on to determine if a given packet is correctly received at the destination and also for retransmission and congestion control. The TCP handshake requires three protocol message exchanges between sender S and receiver R:
S R: The receiver answers with a TCP packet with the flags SYN and ACK set. The acknowledgment number is set to the next sequence number the host is expecting, which in this example is x + 1. In addition, the receiver sets his sequence number to y, since he also wants to synchronize this number with the other party.
After this handshake, both parties know the current value of the sequence and acknowledgment number of the other side. This information is then used for all purposes of TCP — for example, error-free data transfer and congestion control.
Another important aspect of the Internet protocol suite is IP routing. IP routing is important to understand for multiple reasons: It is the ultimate method by which hosts can communicate with one another, and it also provides insights into the topology of the Internet and the topology of smaller networks like the network of a corporation. To successfully create very sophisticated honeynets, a basic understanding of Internet routing is important.