Policy is the behavior of the scheduler that determines what runs when. A scheduler's policy often determines the overall feel of a system and is responsible for optimally utilizing processor time. Therefore, it is very important.
I/O-Bound Versus Processor-Bound Processes
Processes can be classified as either I/O-bound or processor-bound. The former is characterized as a process that spends much of its time submitting and waiting on I/O requests. Consequently, such a process is often runnable, but for only short durations because it will eventually block waiting on more I/O (this is any type of I/O, such as keyboard activity, and not just disk I/O).
Conversely, processor-bound processes spend much of their time executing code. They tend to run until they are preempted because they do not block on I/O requests very often. Because they are not I/O-driven, however, system response does not dictate that the scheduler run them often. A scheduler policy for processor-bound processes, therefore, tends to run such processes less frequently but (optimally, to them) for longer durations. The ultimate example of a processor-bound process is one executing an infinite loop.
Of course, these classifications are not mutually exclusive. Processes can exhibit both behaviors simultaneously: The X Window server, for example, is both processor-intense and I/O-intense. Other processes may be I/O-bound but dive into periods of intense processor action. A good example of this is a word processor, which normally sits waiting for key presses but at any moment might peg the processor in a rabid fit of spell checking.
The scheduling policy in a system must attempt to satisfy two conflicting goals: fast process response time (low latency) and maximal system utilization (high throughput). To satisfy these at-odds requirements, schedulers often employ complex algorithms to determine the most worthwhile process to run while not compromising fairness to other, lower priority, processes. The scheduler policy in Unix variants tends to explicitly favor I/O-bound processes, thus providing good process response time. Linux, aiming to provide good interactive response, optimizes for process response (low latency), thus favoring I/O-bound processes over processor-bound processors. As you will see, this is done in a creative manner that does not neglect processor-bound processes.
A common type of scheduling algorithm is priority-based scheduling. The idea is to rank processes based on their worth and need for processor time. Processes with a higher priority run before those with a lower priority, whereas processes with the same priority are scheduled round-robin (one after the next, repeating). On some systems, Linux included, processes with a higher priority also receive a longer timeslice. The runnable process with timeslice remaining and the highest priority always runs. Both the user and the system may set a process's priority to influence the scheduling behavior of the system.
Linux builds on this idea and provides dynamic priority-based scheduling. This concept begins with an initial base priority and then enables the scheduler to increase or decrease the priority dynamically to fulfill scheduling objectives. For example, a process that is spending more time waiting on I/O than running is clearly I/O bound. Under Linux, it receives an elevated dynamic priority. As a counterexample, a process that continually uses up its entire timeslice is processor boundit would receive a lowered dynamic priority.
The Linux kernel implements two separate priority ranges. The first is the nice value, a number from -20 to +19 with a default of 0. Larger nice values correspond to a lower priorityyou are being nice to the other processes on the system. Processes with a lower nice value (higher priority) run before processes with a higher nice value (lower priority). The nice value also helps determine how long a timeslice the process receives. A process with a nice value of -20 receives the maximum possible timeslice, whereas a process with a nice value of 19 receives the minimum possible timeslice. Nice values are the standard priority range used in all Unix systems.
The second range is the real-time priority. The values are configurable, but by default range from 0 to 99. All real-time processes are at a higher priority than normal processes. Linux implements real-time priorities in accordance with POSIX standards on the matter. Most modern Unix systems implement a similar scheme.
The timeslice is the numeric value that represents how long a task can run until it is preempted. The scheduler policy must dictate a default timeslice, which is not a trivial exercise. Too long a timeslice causes the system to have poor interactive performance; the system will no longer feel as if applications are concurrently executed. Too short a timeslice causes significant amounts of processor time to be wasted on the overhead of switching processes because a significant percentage of the system's time is spent switching from one process with a short timeslice to the next. Furthermore, the conflicting goals of I/O-bound versus processor-bound processes again arise: I/O-bound processes do not need longer timeslices (although they do like to run often), whereas processor-bound processes crave long timeslices (to keep their caches hot, for example).
With this argument, it would seem that any long timeslice would result in poor interactive performance. In many operating systems, this observation is taken to heart, and the default timeslice is rather lowfor example, 20ms. Linux, however, takes advantage of the fact that the highest priority process always runs. The Linux scheduler bumps the priority of interactive tasks, enabling them to run more frequently. Consequently, the Linux scheduler offers a relatively high default timeslice (see Table 4.1, later in this chapter). Furthermore, the Linux scheduler dynamically determines the timeslice of a process based on priority. This enables higher-priority (allegedly more important) processes to run longer and more often. Implementing dynamic timeslices and priorities provides robust scheduling performance.
Figure 4.1. Process timeslice calculation.
Note that a process does not have to use all its timeslice at once. For example, a process with a 100-millisecond timeslice does not have to run for 100 milliseconds in one go or risk losing the remaining timeslice. Instead, the process can run on five different reschedules for 20 milliseconds each. Thus, a large timeslice also benefits interactive tasks: Although they do not need such a large timeslice all at once, it ensures they remain runnable for as long as possible.
When a process's timeslice runs out, the process is considered expired. A process with no timeslice is not eligible to run until all other processes have exhausted their timeslices (that is, they all have zero timeslice remaining). At that point, the timeslices for all processes are recalculated. The Linux scheduler employs an interesting algorithm for handling timeslice exhaustion that is discussed later in this chapter.
As mentioned, the Linux operating system is preemptive. When a process enters the TASK_RUNNING state, the kernel checks whether its priority is higher than the priority of the currently executing process. If it is, the scheduler is invoked to preempt the currently executing process and run the newly runnable process. Additionally, when a process's timeslice reaches zero, it is preempted and the scheduler is again invoked to select a new process.
The Scheduling Policy in Action
Consider a system with two runnable tasks: a text editor and a video encoder. The text editor is I/O-bound because it spends nearly all its time waiting for user key presses (no matter how fast the user types, it is not that fast). Despite this, when the text editor does receive a key press, the user expects the editor to respond immediately. Conversely, the video encoder is processor-bound. Aside from reading the raw data stream from the disk and later writing the resulting video, the encoder spends all its time applying the video codec to the raw data, easily using 100% of the processor. The video encoder does not have any strong time constraints on when it runsif it started running now or in half a second, the user could not tell and would not care. Of course, the sooner it finishes the better, but latency is not a primary concern.
In this scenario example, ideally the scheduler gives the text editor a higher priority and larger timeslice than the video encoder receives because the text editor is interactive. This ensures that the text editor has plenty of timeslice available. Furthermore, because the text editor has a higher priority, it is capable of preempting the video encoder when neededsay, the instant the user presses a key. This guarantees that the text editor is capable of responding to user key presses immediately. This is to the detriment of the video encoder, but because the text editor runs only intermittently, when the user presses a key, the video encoder can monopolize the remaining time. This optimizes the performance of both applications.