26.1 Introduction
In the traditional Unix model, when a process needs something performed by another entity, it forks a child process and lets the child perform the processing. Most network servers under Unix are written this way, as we have seen in our concurrent server examples: The parent accepts the connection, forks a child, and the child handles the client.
While this paradigm has served well for many years, there are problems with fork:
fork is expensive. Memory is copied from the parent to the child, all descriptors are duplicated in the child, and so on. Current implementations use a technique called copy-on-write, which avoids a copy of the parent's data space to the child until the child needs its own copy. But, regardless of this optimization, fork is expensive. IPC is required to pass information between the parent and child after the fork. Passing information from the parent to the child before the fork is easy, since the child starts with a copy of the parent's data space and with a copy of all the parent's descriptors. But, returning information from the child to the parent takes more work.
Threads help with both problems. Threads are sometimes called lightweight processes since a thread is "lighter weight" than a process. That is, thread creation can be 10–100 times faster than process creation.
All threads within a process share the same global memory. This makes the sharing of information easy between the threads, but along with this simplicity comes the problem of synchronization.
More than just the global variables are shared. All threads within a process share the following:
Process instructions Most data Open files (e.g., descriptors) Signal handlers and signal dispositions Current working directory User and group IDs
But each thread has its own
Thread ID Set of registers, including program counter and stack pointer Stack (for local variables and return addresses) errno Signal mask Priority One analogy is to think of signal handlers as a type of thread as we discussed in Section 11.18. That is, in the traditional Unix model, we have the main flow of execution (one thread) and a signal handler (another thread). If the main flow is in the middle of updating a linked list when a signal occurs, and the signal handler also tries to update the linked list, havoc normally results. The main flow and signal handler share the same global variables, but each has its own stack.
In this text, we cover POSIX threads, also called Pthreads. These were standardized in 1995 as part of the POSIX.1c standard and most versions of Unix will support them in the future. We will see that all the Pthread functions begin with pthread_. This chapter is an introduction to threads, so that we can use threads in our network programs. For additional details see [Butenhof 1997].
|