A Beast of a Different Nature

The kernel has several differences compared to normal user-space applications that, although not making it necessarily harder to program than user-space, certainly provide unique challenges to kernel development.

These differences make the kernel a beast of a different nature. Some of the usual rules are bent; other rules are entirely new. Although some of the differences are obvious (we all know the kernel can do anything it wants), others are not so obvious. The most important of these differences are

The kernel does not have access to the C library.
The kernel is coded in GNU C.
The kernel lacks memory protection like user-space.
The kernel cannot easily use floating point.
The kernel has a small fixed-size stack.
Because the kernel has asynchronous interrupts, is preemptive, and supports SMP, synchronization and concurrency are major concerns within the kernel.
Portability is important.

Let's briefly look at each of these issues because all kernel development must keep them in mind.

No libc

Unlike a user-space application, the kernel is not linked against the standard C library (or any other library, for that matter). There are multiple reasons for this, including some chicken-and-the-egg situations, but the primary reason is speed and size. The full C libraryor even a decent subset of itis too large and too inefficient for the kernel.

Do not fret: Many of the usual libc functions have been implemented inside the kernel. For example, the common string manipulation functions are in lib/string.c. Just include <linux/string.h> and have at them.

Header Files

When I talk about header files hereor elsewhere in this bookI am referring to the kernel header files that are part of the kernel source tree. Kernel source files cannot include outside headers, just as they cannot use outside libraries.

Of the missing functions, the most familiar is printf(). The kernel does not have access to printf(), but it does have access to printk(). The printk() function copies the formatted string into the kernel log buffer, which is normally read by the syslog program. Usage is similar to printf():

printk("Hello world! A string: %s and an integer: %d\n", a_string, an_integer);

One notable difference between printf() and printk() is that printk() allows you to specify a priority flag. This flag is used by syslogd(8) to decide where to display kernel messages. Here is an example of these priorities:

printk(KERN_ERR "this is an error!\n");

We will use printk() tHRoughout this book. Later chapters have more information on printk().

GNU C

Like any self-respecting Unix kernel, the Linux kernel is programmed in C. Perhaps surprisingly, the kernel is not programmed in strict ANSI C. Instead, where applicable, the kernel developers make use of various language extensions available in gcc (the GNU Compiler Collection, which contains the C compiler used to compile the kernel and most everything else written in C on a Linux system).

The kernel developers use both ISO C99^[1] and GNU C extensions to the C language. These changes wed the Linux kernel to gcc, although recently other compilers, such as the Intel C compiler, have sufficiently supported enough gcc features that they too can compile the Linux kernel. The ISO C99 extensions that the kernel uses are nothing special and, because C99 is an official revision of the C language, are slowly cropping up in a lot of other code. The more interesting, and perhaps unfamiliar, deviations from standard ANSI C are those provided by GNU C. Let's look at some of the more interesting extensions that may show up in kernel code.

^[1] ISO C99 is the latest major revision to the ISO C standard. C99 adds numerous enhancements to the previous major revision, ISO C90, including named structure initializers and a complex type. The latter of which you cannot use safely from within the kernel.

Inline Functions

GNU C supports inline functions. An inline function is, as its name suggests, inserted inline into each function call site. This eliminates the overhead of function invocation and return (register saving and restore), and allows for potentially more optimization because the compiler can optimize the caller and the called function together. As a downside (nothing in life is free), code size increases because the contents of the function are copied to all the callers, which increases memory consumption and instruction cache footprint. Kernel developers use inline functions for small time-critical functions. Making large functions inline, especially those that are used more than once or are not time critical, is frowned upon by the kernel developers.

An inline function is declared when the keywords static and inline are used as part of the function definition. For example:

static inline void dog(unsigned long tail_size)

The function declaration must precede any usage, or else the compiler cannot make the function inline. Common practice is to place inline functions in header files. Because they are marked static, an exported function is not created. If an inline function is used by only one file, it can instead be placed toward the top of just that file.

In the kernel, using inline functions is preferred over complicated macros for reasons of type safety.

Inline Assembly

The gcc C compiler enables the embedding of assembly instructions in otherwise normal C functions. This feature, of course, is used in only those parts of the kernel that are unique to a given system architecture.

The asm() compiler directive is used to inline assembly code.

The Linux kernel is programmed in a mixture of C and assembly, with assembly relegated to low-level architecture and fast path code. The vast majority of kernel code is programmed in straight C.

Branch Annotation

The gcc C compiler has a built-in directive that optimizes conditional branches as either very likely taken or very unlikely taken. The compiler uses the directive to appropriately optimize the branch. The kernel wraps the directive in very easy-to-use macros, likely() and unlikely().

For example, consider an if statement such as the following:

if (foo) {
        /* ... */
}

To mark this branch as very unlikely taken (that is, likely not taken):

/* we predict foo is nearly always zero ... */
if (unlikely(foo)) {
        /* ... */
}

Conversely, to mark a branch as very likely taken:

/* we predict foo is nearly always nonzero ... */
if (likely(foo)) {
        /* ... */
}

You should only use these directives when the branch direction is overwhelmingly a known priori or when you want to optimize a specific case at the cost of the other case. This is an important point: These directives result in a performance boost when the branch is correctly predicted, but a performance loss when the branch is mispredicted. A very common usage for unlikely() and likely() is error conditions. As one might expect, unlikely() finds much more use in the kernel because if statements tend to indicate a special case.

No Memory Protection

When a user-space application attempts an illegal memory access, the kernel can trap the error, send SIGSEGV, and kill the process. If the kernel attempts an illegal memory access, however, the results are less controlled. (After all, who is going to look after the kernel?) Memory violations in the kernel result in an oops, which is a major kernel error. It should go without saying that you must not illegally access memory, such as dereferencing a NULL pointerbut within the kernel, the stakes are much higher!

Additionally, kernel memory is not pageable. Therefore, every byte of memory you consume is one less byte of available physical memory. Keep that in mind next time you have to add one more feature to the kernel!

No (Easy) Use of Floating Point

When a user-space process uses floating-point instructions, the kernel manages the transition from integer to floating point mode. What the kernel has to do when using floating-point instructions varies by architecture, but the kernel normally catches a trap and does something in response.

Unlike user-space, the kernel does not have the luxury of seamless support for floating point because it cannot trap itself. Using floating point inside the kernel requires manually saving and restoring the floating point registers, among possible other chores. The short answer is: Don't do it; no floating point in the kernel.

Small, Fixed-Size Stack

User-space can get away with statically allocating tons of variables on the stack, including huge structures and many-element arrays. This behavior is legal because user-space has a large stack that can grow in size dynamically (developers of older, less intelligent operating systemssay, DOSmight recall a time when even user-space had a fixed-sized stack).

The kernel stack is neither large nor dynamic; it is small and fixed in size. The exact size of the kernel's stack varies by architecture. On x86, the stack size is configurable at compile-time and can be either 4 or 8KB. Historically, the kernel stack is two pages, which generally implies that it is 8KB on 32-bit architectures and 16KB on 64-bit architecturesthis size is fixed and absolute. Each process receives its own stack.

The kernel stack is discussed in much greater detail in later chapters.

Synchronization and Concurrency

The kernel is susceptible to race conditions. Unlike a single-threaded user-space application, a number of properties of the kernel allow for concurrent access of shared resources and thus require synchronization to prevent races. Specifically,

Linux is a preemptive multi-tasking operating system. Processes are scheduled and rescheduled at the whim of the kernel's process scheduler. The kernel must synchronize between these tasks.
The Linux kernel supports multiprocessing. Therefore, without proper protection, kernel code executing on two or more processors can access the same resource.
Interrupts occur asynchronously with respect to the currently executing code. Therefore, without proper protection, an interrupt can occur in the midst of accessing a shared resource and the interrupt handler can then access the same resource.
The Linux kernel is preemptive. Therefore, without protection, kernel code can be preempted in favor of different code that then accesses the same resource.

Typical solutions to race conditions include spinlocks and semaphores.

Later chapters provide a thorough discussion of synchronization and concurrency.

Portability Is Important

Although user-space applications do not have to aim for portability, Linux is a portable operating system and should remain one. This means that architecture-independent C code must correctly compile and run on a wide range of systems, and that architecture-dependent code must be properly segregated in system-specific directories in the kernel source tree.

A handful of rulessuch as remain endian neutral, be 64-bit clean, do not assume the word or page size, and so ongo a long way. Portability is discussed in extreme depth in a later chapter.