Buffers and Buffer Heads

When a block is stored in memory (say, after a read or pending a write), it is stored in a buffer. Each buffer is associated with exactly one block. The buffer serves as the object that represents a disk block in memory. Recall that a block comprises one or more sectors, but is no more than a page in size. Therefore, a single page can hold one or more blocks in memory. Because the kernel requires some associated control information to accompany the data (such as from which block device and which specific block the buffer is), each buffer is associated with a descriptor. The descriptor is called a buffer head and is of type struct buffer_head. The buffer_head structure holds all the information that the kernel needs to manipulate buffers and is defined in <linux/buffer_head.h>.

Take a look at this structure, with comments describing each field:

struct buffer_head {
        unsigned long        b_state;          /* buffer state flags */
        atomic_t             b_count;          /* buffer usage counter */
        struct buffer_head   *b_this_page;     /* buffers using this page */
        struct page          *b_page;          /* page storing this buffer */
        sector_t             b_blocknr;        /* logical block number */
        u32                  b_size;           /* block size (in bytes) */
        char                 *b_data;          /* buffer in the page */
        struct block_device  *b_bdev;          /* device where block resides */
        bh_end_io_t          *b_end_io;        /* I/O completion method */
        void                 *b_private;       /* data for completion method */
        struct list_head     b_assoc_buffers;  /* list of associated mappings */
};

The b_state field specifies the state of this particular buffer. It can be one or more of the flags in Table 13.1. The legal flags are stored in the bh_state_bits enumeration, which is defined in <linux/buffer_head.h>.

Table 13.1. bh_state Flags
Status Flag
Meaning
BH_Uptodate
Buffer contains valid data
BH_Dirty
Buffer is dirty (the contents of the buffer are newer than the contents of the block on disk and therefore the buffer must eventually be written back to disk)
BH_Lock
Buffer is undergoing disk I/O and is locked to prevent concurrent access
BH_Req
Buffer is involved in an I/O request
BH_Mapped
Buffer is a valid buffer mapped to an on-disk block
BH_New
Buffer is newly mapped via get_block() and not yet accessed
BH_Async_Read
Buffer is undergoing asynchronous read I/O via end_buffer_async_read()
BH_Async_Write
Buffer is undergoing asynchronous write I/O via end_buffer_async_write()
BH_Delay
Buffer does not yet have an associated on-disk block
BH_Boundary
Buffer forms the boundary of contiguous blocksthe next block is discontinuous

The bh_state_bits enumeration also contains as the last value in the list a BH_PrivateStart flag. This is not a valid state flag, but instead corresponds to the first usable bit of which other code can make use. All bit values equal to and greater than BH_PrivateStart are not used by the block I/O layer proper, so these bits are safe to use by individual drivers who want to store information in the b_state field. Drivers can base the bit values of their internal flags off this flag and rest assured that they are not encroaching on an official bit used by the block I/O layer.

The b_count field is the buffer's usage count. The value is incremented and decremented by two inline functions, both of which are defined in <linux/buffer_head.h>:

static inline void get_bh(struct buffer_head *bh)
{
        atomic_inc(&bh->b_count);
}

static inline void put_bh(struct buffer_head *bh)
{
        atomic_dec(&bh->b_count);
}

Before manipulating a buffer head, you must increment its reference count via get_bh() to ensure that the buffer head is not deallocated out from under you. When finished with the buffer head, decrement the reference count via put_bh().

The physical block on disk to which a given buffer corresponds is the b_blocknr-th logical block on the block device described by b_bdev.

The physical page in memory to which a given buffer corresponds is the page pointed to by b_page. More specifically, b_data is a pointer directly to the block (that exists somewhere in b_page), which is b_size bytes in length. Therefore, the block is located in memory starting at address b_data and ending at address (b_data + b_size).

The purpose of a buffer head is to describe this mapping between the on-disk block and the physical in-memory buffer (which is a sequence of bytes on a specific page). Acting as a descriptor of this buffer-to-block mapping is the data structure's only role in the kernel.

Before the 2.6 kernel, the buffer head was a much more important data structure: It was the unit of I/O in the kernel. Not only did the buffer head describe the disk-block-to-physical-page mapping, but it also acted as the container used for all block I/O. This had two primary problems. First, the buffer head was a large and unwieldy data structure (it has shrunken a bit nowadays), and it was neither clean nor simple to manipulate data in terms of buffer heads. Instead, the kernel prefers to work in terms of pages, which are simple and allow for greater performance. A large buffer head describing each individual buffer (which might be smaller than a page) was inefficient. Consequently, in the 2.6 kernel, much work has gone into making the kernel work directly with pages and address spaces instead of buffers. Some of this work is discussed in Chapter 15, "The Page Cache and Page Writeback," where the address_space structure and the pdflush daemons are discussed.

The second issue with buffer heads is that they describe only a single buffer. When used as the container for all I/O operations, the buffer head forces the kernel to break up potentially large block I/O operations (say, a write) into multiple buffer_head structures. This results in needless overhead and space consumption. As a result, the primary goal of the 2.5 development kernel was to introduce a new, flexible, and lightweight container for block I/O operations. The result is the bio structure, which is discussed in the next section.

Buffers and Buffer Heads

Table 13.1. bh_state Flags

Table 13.1. `bh_state` Flags