Team LiB
Previous Section Next Section

The Memory Descriptor

The kernel represents a process's address space with a data structure called the memory descriptor. This structure contains all the information related to the process address space. The memory descriptor is represented by struct mm_struct and defined in <linux/sched.h>[3].

[3] There is a rather tangled interdependency between the process descriptor, the memory descriptor, and their related functions. Consequently, struct mm_struct ends up in sched.h.

Let's look at the memory descriptor, with comments added describing each field:

struct mm_struct {
        struct vm_area_struct  *mmap;               /* list of memory areas */
        struct rb_root         mm_rb;               /* red-black tree of VMAs */
        struct vm_area_struct  *mmap_cache;         /* last used memory area */
        unsigned long          free_area_cache;     /* 1st address space hole */
        pgd_t                  *pgd;                /* page global directory */
        atomic_t               mm_users;            /* address space users */
        atomic_t               mm_count;            /* primary usage counter */
        int                    map_count;           /* number of memory areas */
        struct rw_semaphore    mmap_sem;            /* memory area semaphore */
        spinlock_t             page_table_lock;     /* page table lock */
        struct list_head       mmlist;              /* list of all mm_structs */
        unsigned long          start_code;          /* start address of code */
        unsigned long          end_code;            /* final address of code */
        unsigned long          start_data;          /* start address of data */
        unsigned long          end_data;            /* final address of data */
        unsigned long          start_brk;           /* start address of heap */
        unsigned long          brk;                 /* final address of heap */
        unsigned long          start_stack;         /* start address of stack */
        unsigned long          arg_start;           /* start of arguments */
        unsigned long          arg_end;             /* end of arguments */
        unsigned long          env_start;           /* start of environment */
        unsigned long          env_end;             /* end of environment */
        unsigned long          rss;                 /* pages allocated */
        unsigned long          total_vm;            /* total number of pages */
        unsigned long          locked_vm;           /* number of locked pages */
        unsigned long          def_flags;           /* default access flags */
        unsigned long          cpu_vm_mask;         /* lazy TLB switch mask */
        unsigned long          swap_address;        /* last scanned address */
        unsigned               dumpable:1;          /* can this mm core dump? */
        int                    used_hugetlb;        /* used hugetlb pages? */
        mm_context_t           context;             /* arch-specific data */
        int                    core_waiters;        /* thread core dump waiters */
        struct completion      *core_startup_done;  /* core start completion */
        struct completion      core_done;           /* core end completion */
        rwlock_t               ioctx_list_lock;     /* AIO I/O list lock */
        struct kioctx          *ioctx_list;         /* AIO I/O list */
        struct kioctx          default_kioctx;      /* AIO default I/O context */

The mm_users field is the number of processes using this address space. For example, if two threads share this address space, mm_users is equal to two. The mm_count field is the primary reference count for the mm_struct. All mm_users equate to one increment of mm_count. Thus, in the previous example, mm_count is only one. Only when mm_users reaches zero (when both threads exit) is mm_count decremented. When mm_count finally reaches zero, there are no remaining references to this mm_struct and it is freed. Having two counters enables the kernel to differentiate between the main usage counter (mm_count) and the number of processes using the address space (mm_users).

The mmap and mm_rb fields are different data structures that contain the same thing: all the memory areas in this address space. The former stores them in a linked list, whereas the latter stores them in a red-black tree. A red-black tree is a type of binary tree; like all binary trees, searching for a given element is an O(log n) operation. For further discussion on red-black trees, see "Lists and Trees of Memory Areas," later in this chapter.

Although the kernel would normally avoid the extra baggage of using two data structures to organize the same data, the redundancy comes in handy here. The mmap data structure, as a linked list, allows for simple and efficient traversing of all elements. On the other hand, the mm_rb data structure, as a red-black tree, is more suitable to searching for a given element. Memory areas are discussed in more detail later in this chapter.

All of the mm_struct structures are strung together in a doubly linked list via the mmlist field. The initial element in the list is the init_mm memory descriptor, which describes the address space of the init process. The list is protected from concurrent access via the mmlist_lock, which is defined in kernel/fork.c. The total number of memory descriptors is stored in the mmlist_nr global integer, which is defined in the same place.

Allocating a Memory Descriptor

The memory descriptor associated with a given task is stored in the mm field of the task's process descriptor. Thus, current->mm is the current process's memory descriptor. The copy_mm() function is used to copy a parent's memory descriptor to its child during fork(). The mm_struct structure is allocated from the mm_cachep slab cache via the allocate_mm() macro in kernel/fork.c. Normally, each process receives a unique mm_struct and thus a unique process address space.

Processes may elect to share their address spaces with their children by means of the CLONE_VM flag to clone().The process is then called a thread. Recall from Chapter 3, "Process Management," that this is essentially the only difference between normal processes and so-called threads in Linux; the Linux kernel does not otherwise differentiate between them. Threads are regular processes to the kernel that merely share certain resources.

In the case that CLONE_VM is specified, allocate_mm() is not called and the process's mm field is set to point to the memory descriptor of its parent via this logic in copy_mm():

if (clone_flags & CLONE_VM) {
         * current is the parent process and
         * tsk is the child process during a fork()
         tsk->mm = current->mm;

Destroying a Memory Descriptor

When the process associated with a specific address space exits, the exit_mm() function is invoked. This function performs some housekeeping and updates some statistics. It then calls mmput(), which decrements the memory descriptor's mm_users user counter. If the user count reaches zero, mmdrop() is called to decrement the mm_count usage counter. If that counter is finally zero, then the free_mm() macro is invoked to return the mm_struct to the mm_cachep slab cache via kmem_cache_free(), because the memory descriptor does not have any users.

The mm_struct and Kernel Threads

Kernel threads do not have a process address space and therefore do not have an associated memory descriptor. Thus, the mm field of a kernel thread's process descriptor is NULL. This is pretty much the definition of a kernel threadprocesses that have no user context.

This lack of an address space is fine, because kernel threads do not ever access any user-space memory (whose would they access?). Because kernel threads do not have any pages in user-space, they do not really deserve their own memory descriptor and page tables (page tables are discussed later in the chapter). Despite this, kernel threads need some of the data, such as the page tables, even to access kernel memory. To provide kernel threads the needed data, without wasting memory on a memory descriptor and page tables, or wasting processor cycles to switch to a new address space whenever a kernel thread begins running, kernel threads use the memory descriptor of whatever task ran previously.

Whenever a process is scheduled, the process address space referenced by the process's mm field is loaded. The active_mm field in the process descriptor is then updated to refer to the new address space. Kernel threads do not have an address space and mm is NULL. Therefore, when a kernel thread is scheduled, the kernel notices that mm is NULL and keeps the previous process's address space loaded. The kernel then updates the active_mm field of the kernel thread's process descriptor to refer to the previous process's memory descriptor. The kernel thread can then use the previous process's page tables as needed. Because kernel threads do not access user-space memory, they make use of only the information in the address space pertaining to kernel memory, which is the same for all processes.

    Team LiB
    Previous Section Next Section