capabilities — overview of Linux capabilities
For the purpose of performing permission checks,
traditional Unix implementations distinguish two categories
of processes: privileged
processes (whose
effective user ID is 0, referred to as superuser or root),
and unprivileged
processes (whose effective UID is nonzero). Privileged
processes bypass all kernel permission checks, while
unprivileged processes are subject to full permission
checking based on the process's credentials (usually:
effective UID, effective GID, and supplementary group
list).
Starting with kernel 2.2, Linux divides the privileges
traditionally associated with superuser into distinct units,
known as capabilities
, which can be
independently enabled and disabled. Capabilities are a
per-thread attribute.
As at Linux 2.6.14, the following capabilities are implemented:
CAP_AUDIT_CONTROL
(since Linux
2.6.11)Enable and disable kernel auditing; change auditing filter rules; retrieve auditing status and filtering rules.
CAP_AUDIT_WRITE
(since Linux
2.6.11)Allow records to be written to kernel auditing log.
CAP_CHOWN
Allow arbitrary changes to file UIDs and GIDs (see chown(2)).
CAP_DAC_OVERRIDE
Bypass file read, write, and execute permission checks. (DAC = "discretionary access control".)
CAP_DAC_READ_SEARCH
Bypass file read permission checks and directory read and execute permission checks.
CAP_FOWNER
Bypass permission checks on operations that
normally require the file system UID of the process
to match the UID of the file (e.g., chmod(2), utime(2)),
excluding those operations covered by the
CAP_DAC_OVERRIDE
and
CAP_DAC_READ_SEARCH
;
set extended file attributes (see chattr(1)) on arbitrary
files; set Access Control Lists (ACLs) on arbitrary
files; ignore directory sticky bit on file deletion;
specify O_NOATIME
for
arbitrary files in open(2) and
fcntl(2).
CAP_FSETID
Don't clear set-user-ID and set-group-ID bits when a file is modified; permit setting of the set-group-ID bit for a file whose GID does not match the file system or any of the supplementary GIDs of the calling process.
CAP_IPC_LOCK
Permit memory locking (mlock(2), mlockall(2), mmap(2), shmctl(2)).
CAP_IPC_OWNER
Bypass permission checks for operations on System V IPC objects.
CAP_KILL
Bypass permission checks for sending signals (see
kill(2)). This
includes use of the KDSIGACCEPT
ioctl.
CAP_LEASE
(Linux 2.4 onwards) Allow file leases to be established on arbitrary files (see fcntl(2)).
CAP_LINUX_IMMUTABLE
Allow setting of the EXT2_APPEND_FL
and EXT2_IMMUTABLE_FL
extended file
attributes (see chattr(1)).
CAP_MKNOD
(Linux 2.4 onwards) Allow creation of special files using mknod(2).
CAP_NET_ADMIN
Allow various network-related operations (e.g., setting privileged socket options, enabling multicasting, interface configuration, modifying routing tables).
CAP_NET_BIND_SERVICE
Allow binding to Internet domain reserved socket ports (port numbers less than 1024).
CAP_NET_BROADCAST
(Unused) Allow socket broadcasting, and listening multicasts.
CAP_NET_RAW
Permit use of RAW and PACKET sockets.
CAP_SETGID
Allow arbitrary manipulations of process GIDs and supplementary GID list; allow forged GID when passing socket credentials via Unix domain sockets.
CAP_SETPCAP
Grant or remove any capability in the caller's permitted capability set to or from any other process.
CAP_SETUID
Allow arbitrary manipulations of process UIDs (setuid(2), setreuid(2), setresuid(2), setfsuid(2)); allow forged UID when passing socket credentials via Unix domain sockets.
CAP_SYS_ADMIN
Permit a range of system administration operations
including: quotactl(2),
mount(2), umount(2),
swapon(2),
swapoff(2),
sethostname(2),
setdomainname(2),
IPC_SET
and
IPC_RMID
operations on
arbitrary System V IPC objects; perform operations on
trusted
and
security
Extended Attributes (see attr(5)); call
lookup_dcookie(2);
use ioprio_set(2) to
assign IOPRIO_CLASS_RT
and IOPRIO_CLASS_IDLE
I/O scheduling classes; perform keyctl(2) KEYCTL_CHOWN
and KEYCTL_SETPERM
operations. allow
forged UID when passing socket credentials; exceed
/proc/sys/fs/file-max
,
the system-wide limit on the number of open files, in
system calls that open files (e.g., accept(2),
execve(2),
open(2), pipe(2); without
this capability these system calls will fail with the
error ENFILE if this
limit is encountered); employ CLONE_NEWNS
flag with clone(2) and
unshare(2); perform
KEYCTL_CHOWN
and
KEYCTL_SETPERM
keyctl(2)
operations.
CAP_SYS_BOOT
Permit calls to reboot(2) and kexec_load(2).
CAP_SYS_CHROOT
Permit calls to chroot(2).
CAP_SYS_MODULE
Allow loading and unloading of kernel modules; allow modifications to capability bounding set (see init_module(2) and delete_module(2)).
CAP_SYS_NICE
Allow raising process nice value (nice(2), setpriority(2)) and
changing of the nice value for arbitrary processes;
allow setting of real-time scheduling policies for
calling process, and setting scheduling policies and
priorities for arbitrary processes (sched_setscheduler(2),
sched_setparam(2));
set CPU affinity for arbitrary processes (sched_setaffinity(2));
set I/O scheduling class and priority for arbitrary
processes (ioprio_set(2));
allow migrate_pages(2) to be
applied to arbitrary processes and allow processes to
be migrated to arbitrary nodes; allow move_pages(2) to be
applied to arbitrary processes; use the MPOL_MF_MOVE_ALL
flag with
mbind(2) and
move_pages(2).
CAP_SYS_PACCT
Permit calls to acct(2).
CAP_SYS_PTRACE
Allow arbitrary processes to be traced using ptrace(2)
CAP_SYS_RAWIO
Permit I/O port operations (iopl(2) and
ioperm(2)); access
/proc/kcore
.
CAP_SYS_RESOURCE
Permit: use of reserved space on ext2 file
systems; ioctl(2) calls
controlling ext3 journaling; disk quota limits to be
overridden; resource limits to be increased (see
setrlimit(2));
RLIMIT_NPROC
resource
limit to be overridden; msg_qbytes
limit for
a message queue to be raised above the limit in
/proc/sys/kernel/msgmnb
(see msgop(2) and
msgctl(2).
CAP_SYS_TIME
Allow modification of system clock (settimeofday(2), stime(2), adjtimex(2)); allow modification of real-time (hardware) clock
CAP_SYS_TTY_CONFIG
Permit calls to vhangup(2).
Each thread has three capability sets containing zero or more of the above capabilities:
Effective
:the capabilities used by the kernel to perform permission checks for the thread.
Permitted
:the capabilities that the thread may assume (i.e., a limiting superset for the effective and inheritable sets). If a thread drops a capability from its permitted set, it can never re-acquire that capability (unless it execve(2)s a set-user-ID-root program).
inheritable
:the capabilities preserved across an execve(2).
A child created via fork(2) inherits copies of its parent's capability sets. See below for a discussion of the treatment of capabilities during execve(2).
Using capset(2), a thread may
manipulate its own capability sets, or, if it has the
CAP_SETPCAP
capability, those
of a thread in another process.
When a program is execed, the permitted and effective
capabilities are ANDed with the current value of the
so-called capability bounding
set, defined in the file /proc/sys/kernel/cap-bound
. This
parameter can be used to place a system-wide limit on the
capabilities granted to all subsequently executed programs.
(Confusingly, this bit mask parameter is expressed as a
signed decimal number in /proc/sys/kernel/cap-bound
.)
Only the init
process may set bits in the capability bounding set; other
than that, the superuser may only clear bits in this
set.
On a standard system the capability bounding set always
masks out the CAP_SETPCAP
capability. To remove this restriction (dangerous!), modify
the definition of CAP_INIT_EFF_SET
in include/linux/capability.h
and rebuild
the kernel.
The capability bounding set feature was added to Linux starting with kernel version 2.2.11.
A full implementation of capabilities requires:
that for all privileged operations, the kernel check whether the thread has the required capability in its effective set.
that the kernel provide system calls allowing a thread's capability sets to be changed and retrieved.
file system support for attaching capabilities to an executable file, so that a process gains those capabilities when the file is execed.
As at Linux 2.6.14, only the first two of these requirements are met.
Eventually, it should be possible to associate three capability sets with an executable file, which, in conjunction with the capability sets of the thread, will determine the capabilities of a thread after an execve(2):
Inheritable
(formerly
known as allowed
):this set is ANDed with the thread's inheritable set to determine which inheritable capabilities are permitted to the thread after the execve(2).
Permitted
(formerly
known as forced
):the capabilities automatically permitted to the thread, regardless of the thread's inheritable capabilities.
Effective
:those capabilities in the thread's new permitted set are also to be set in the new effective set. (F(effective) would normally be either all zeroes or all ones.)
In the meantime, since the current implementation does not support file capability sets, during an execve(2):
All three file capability sets are initially assumed to be cleared.
If a set-user-ID-root program is being execed, or the real user ID of the process is 0 (root) then the file inheritable and permitted sets are defined to be all ones (i.e., all capabilities enabled).
If a set-user-ID-root program is being executed, then the file effective set is defined to be all ones.
During an execve(2), the kernel calculates the new capabilities of the process using the following algorithm:
P'(permitted) = (P(inheritable) & F(inheritable)) | (F(permitted) & cap_bset) P'(effective) = P'(permitted) & F(effective) P'(inheritable) = P(inheritable) [i.e., unchanged]
where:
In the current implementation, the upshot of this
algorithm is that when a process execve(2)s a
set-user-ID-root program, or when a process with an
effective UID of 0 execve(2)s a program, it
gains all capabilities in its permitted and effective
capability sets, except those masked out by the capability
bounding set (i.e., CAP_SETPCAP
). This provides semantics
that are the same as those provided by traditional Unix
systems.
To preserve the traditional semantics for transitions between 0 and nonzero user IDs, the kernel makes the following changes to a thread's capability sets on changes to the thread's real, effective, saved set, and file system user IDs (using setuid(2), setresuid(2), or similar):
If one or more of the real, effective or saved set user IDs was previously 0, and as a result of the UID changes all of these IDs have a nonzero value, then all capabilities are cleared from the permitted and effective capability sets.
If the effective user ID is changed from 0 to nonzero, then all capabilities are cleared from the effective set.
If the effective user ID is changed from nonzero to 0, then the permitted set is copied to the effective set.
If the file system user ID is changed from 0 to
nonzero (see setfsuid(2)) then
the following capabilities are cleared from the
effective set: CAP_CHOWN
, CAP_DAC_OVERRIDE
, CAP_DAC_READ_SEARCH
, CAP_FOWNER
, and CAP_FSETID
. If the file system UID
is changed from nonzero to 0, then any of these
capabilities that are enabled in the permitted set
are enabled in the effective set.
If a thread that has a 0 value for one or more of its
user IDs wants to prevent its permitted capability set
being cleared when it resets all of its user IDs to nonzero
values, it can do so using the prctl(2) PR_SET_KEEPCAPS
operation.
No standards govern capabilities, but the Linux capability implementation is based on the withdrawn POSIX.1e draft standard.
The libcap
package provides a suite of routines for setting and getting
capabilities that is more comfortable and less likely to
change than the interface provided by capset(2) and capget(2).
There is as yet no file system support allowing capabilities to be associated with executable files.
This page is part of release 2.79 of the Linux man-pages
project. A
description of the project, and information about reporting
bugs, can be found at
http://www.kernel.org/doc/man-pages/.
Copyright (c) 2002 by Michael Kerrisk <mtk.manpagesgmail.com> Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Since the Linux kernel and libraries are constantly changing, this manual page may be incorrect or out-of-date. The author(s) assume no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein. Formatted or processed versions of this manual, if unaccompanied by the source, must acknowledge the copyright and authors of this work. 6 Aug 2002 - Initial Creation Modified 2003-05-23, Michael Kerrisk, <mtk.manpagesgmail.com> Modified 2004-05-27, Michael Kerrisk, <mtk.manpagesgmail.com> 2004-12-08, mtk Added O_NOATIME for CAP_FOWNER 2005-08-16, mtk, Added CAP_AUDIT_CONTROL and CAP_AUDIT_WRITE FIXME sergehallyn.com promises updates to this page in loine with recent changes to capabilities code in kernel, Feb 2008. |