http://www.ibm.com/developerworks/library/l-posixcap.html
Some programs need to perform privileged operations on behalf of an unprivileged user. For instance, the passwd
program writes to the very sensitive /etc/passwd
and /etc/shadow
files. On UNIX® systems, you achieve this control by setting the setuid bit on the binary file. This bit tells the system that while the program is running -- regardless of who executed it -- it should be treated as belonging to the user who owns the file, typically the root user. Because the passwd
program cannot be written to by the user, and is very constrained in what it allows the user to do, this setup is usually safe. More complicated programs make use of saved uids to switch back and forth between root and a non-root user.
POSIX capabilities break the root privileges into smaller pieces, and allow tasks to run with only a subset of the root user's privileges. File capabilities allow such privileges to be attached to a program, greatly simplifying the use of capabilities. POSIX capabilities have been available in Linux for years. Using capabilities has several advantages over being the root user:
exec(3)
of a regular executable file, all capabilities are lost. (The details are more complicated and are soon expected to change, as will be explained later in this article.)This article shows you how programs can make use of POSIX capabilities, how to investigate which capabilities are needed by a program, and how to assign those capabilities to the program.
Process capabilities
For years, POSIX capabilities could be assigned to processes, but not to files. A program therefore had to be started by root (or be owned by root and have its setuid bit set) before it could drop some of its root privileges while keeping others. Additionally, the order in which capabilities had to be dropped was very specific:
prctl
.A process carries three capability sets: permitted (P), inheritable (I), and effective (E). When a process forks, the child's capability sets are copied from the parent. When a process executes a new program, its new capability sets are calculated according to a formula I will discuss in a moment.
The effective set
consists of those capabilities that the process can currently use. The effective set must always be a subset of the permitted set. The process can change the contents of the effective set at any time as long as the effective set does not exceed the permitted set. The inheritable set is used only for calculating the new capability sets after exec()
.
Listing 1 shows the three formulas that dictate a process's new capability sets after file execution according to the POSIX draft (see Resources for a link to IEEE Std 1003.1-2001).
pI' = pI pP' = fP | (fI & pI) pE' = pP' & fE |
A value ending with a '
indicates the newly calculated value. A value beginning with a p
indicates a process capability. A value beginning with an f
indicates a file capability.
The inheritable set is taken unchanged from the parent process, so once a process drops a capability from its inheritable set, it should never be able to regain it (but read the discussion of SECURE_NOROOT
below). The new permitted set is taken as a union of the file's permitted set and the result of intersecting the file's and process's inheritable sets. The process's effective set is the conjunction of the new permitted and file effective sets. Technically, in Linux fE
is not a set but a boolean. If true, the pE'
is set topP'
. If false, then pE'
starts empty.
For a process to keep any capabilities after executing a file, the capabilities must be in the file's permitted or inheritable set. Because Linux has not implemented file capabilities for most of its life, this posed an unworkable restriction. To get around it, a "secure mode" was implemented. It consists of two bits:
SECURE_NOROOT
is not set, then when a process executes a file, the new capability sets may be calculated as though the file had some file capability sets set fully populated. In particular:
SECURE_NO_SETUID_FIXUP
is not set, then when a process switches its real or effective uids to or from 0, capability sets are further shifted around:
This set of rules allows a process to have capabilities either by virtue of being root or by running a setuid root file. However, theSECURE_NO_SETUID_FIXUP
rules prevent a process from keeping any capabilities after becoming non-root. But withSECURE_NOROOT
unset, a root process having dropped some capabilities can simply execute another program to regain its capabilities. So in order for capabilities to be useful, a root process must be able to irrevocably switch its uid to non-zero while keeping a few capabilities.
Using prctl(3)
, a process can request keeping its capabilities across its next setuid(2)
call. This means that a process can:
prctl(2)
to set PR_SET_KEEPCAPS, which asks the system to let it keep its capabilities across setuid(2)
.setuid(2)
or a related system call to change its userid.cap_set_proc(3)
to drop capabilities.Now the process can continue running with a subset of root privileges. If it is compromised, the attacker can use only the capabilities present in the effective set, or, with a call to cap_set_proc(3)
, in its permitted set. And if the attacker should coerce the program into executing another file, all capabilities will be dropped and the file will be executed as an unprivileged user.
The function exec_with_caps()
in Listing 2 shows a function that can be used by a setuid root program to continue execution at a specified function as a specified userid and with a set of capabilities specified as a string.
#include <sys/prctl.h> #include <sys/capability.h> #include <sys/types.h> #include <stdio.h> int printmycaps(void *d) { cap_t cap = cap_get_proc(); printf("Running with uid %d/n", getuid()); printf("Running with capabilities: %s/n", cap_to_text(cap, NULL)); cap_free(cap); return 0; } int exec_with_caps(int newuid, char *capstr, int (*f)(void *data), void *data) { int ret; cap_t newcaps; ret = prctl(PR_SET_KEEPCAPS, 1); if (ret) { perror("prctl"); return -1; } ret = setresuid(newuid, newuid, newuid); if (ret) { perror("setresuid"); return -1; } newcaps = cap_from_text(capstr); ret = cap_set_proc(newcaps); if (ret) { perror("cap_set_proc"); return -1; } cap_free(newcaps); f(data); } int main(int argc, char *argv[]) { if (argc < 2) { printf("Usage: %s <capability_list>/n", argv[0]); return 1; } return exec_with_caps(1000, argv[1], printmycaps, NULL); } |
To test this, paste the code into a file named execwithcaps.c, and compile and run it as root:
gcc -o execwithcaps execwithcaps.c -lcap ./execwithcaps cap_sys_admin=eip |
Back to top
File capabilities
File capabilities are currently implemented in the -mm
kernel tree, and are expected in the mainline kernel by 2.6.24. With file capabilities, you can assign capabilities to a program. For example, the ping program requires CAP_NET_RAW
in order to function. For this reason, it has historically been a setuid root program. With file capabilities, you can reduce the amount of privilege invested in the program by doing:
chmod u-s /bin/ping setfcaps -c cap_net_admin=p -e /bin/ping |
This requires the newest version of the libcap libraries and related programs, which are available at GoogleCode (seeResources for a link). This first removes the setuid bit from the binary, then assigns it the CAP_NET_RAW
privilege it needs. Now any user can run ping with the CAP_NET_RAW
privilege, but if the ping program is compromised, the attacker can exercise no other privileges.
The question arises how you would determine the minimal capability set required for an unprivileged user to run any particular program. If there were only the one program, a worthwhile approach would be to scour the application, its dynamically linked libraries, and the kernel sources. This action, though, would need to be repeated for all setuid root programs. Of course, this approach may not be a bad idea before allowing an application to be run as root by an unprivileged user, but it is unfortunately an unrealistic prospect.
If a program were verbose and well behaved, it might be possible to simply run the program without privilege and have it complain about which privileges it lacks. Let's try that with ping.
chmod u-s /bin/ping setfcaps -r /bin/ping su - myuser ping google.com ping: icmp open socket: Operation not permitted |
This technique could be helpful depending on our understanding of the implementation of icmp
, but it certainly isn't spelled out for us.
Next, we can try to run the program (again without the suid bit) under strace
. strace
reports all system calls used by the program along with their return values, so we can look through the strace
output for return values indicating lack of permission.
strace -oping.out ping google.com grep EPERM ping.out socket(PF_INET, SOCK_RAW, IPPROTO_ICMP) = -1 EPERM (Operation not permitted) |
The permission we lack is to create a socket of type SOCK_RAW
. Reading through /usr/include/linux/capability.h, you'll see that:
/* Allow use of RAW sockets */ /* Allow use of PACKET sockets */ #define CAP_NET_RAW 13 |
In this case, it is clear that CAP_NET_RAW
is the capability needed in order to allow unprivileged users to use ping. However, it does seem likely that some programs will attempt and be denied with -EPERM
many things that they don't actually need to do. It's also likely that the capability it will need won't be quite as simple to guess.
Another more practical approach may be to insert a probe into the kernel at the place where capabilities are checked. The probe will print debugging information about denied capabilities.
kprobes
allow developers to write small kernel modules to run code at the start of a function (jprobe
), the end of a function (kretprobe
), or at any address (kprobe
). Enabling this ability allows you to obtain information about which capabilities the kernel requires to run certain programs. (This remainder of this section assumes that you have a kernel with both kprobes
and file capabilities enabled.)
Listing 3 is a kernel module that inserts a jprobe
to instrument the start of the cap_capable()
kernel function.
#include <linux/kernel.h> #include <linux/module.h> #include <linux/kprobes.h> #include <linux/sched.h> static const char *probed_func = "cap_capable"; int cr_capable (struct task_struct *tsk, int cap) { printk(KERN_NOTICE "%s: asking for capability %d for %s/n", __FUNCTION__, cap, tsk->comm); jprobe_return(); return 0; } static struct jprobe jp = { .entry = JPROBE_ENTRY(cr_capable) }; static int __init kprobe_init(void) { int ret; jp.kp.symbol_name = (char *)probed_func; if ((ret = register_jprobe(&jp)) < 0) { printk("%s: register_jprobe failed, returned %d/n", __FUNCTION__, ret); return -1; } return 0; } static void __exit kprobe_exit(void) { unregister_jprobe(&jp); printk("capable kprobes unregistered/n"); } module_init(kprobe_init); module_exit(kprobe_exit); MODULE_LICENSE("GPL"); |
When this kernel module is inserted, any calls to cap_capable()
are replaced by a call to the cr_capable()
function. This function prints the name of the program that requires capabilities and the capability being checked. It then continues executing the actual cap_capable()
call through the call to jprobe_return()
.
Compile the module using the makefile in Listing 4:
obj-m := capable_probe.o KDIR := /lib/modules/$(shell uname -r)/build PWD := $(shell pwd) default: $(MAKE) -C $(KDIR) SUBDIRS=$(PWD) modules clean: rm -f *.mod.c *.ko *.o |
Then execute as root:
/sbin/insmod capable_probe.ko |
Now in one window, watch the system logs using:
tail -f /var/log/messages |
In another window, as non-root, execute the ping binary without the setuid bit set:
/bin/ping google.com |
The system logs now contain multiple entries for ping. These are the capabilities that the program attempted to use. Not that all of these are needed. We can cross-reference /usr/include/linux/capability.h to convert the integer to a capability name and see that ping requested 21, 13, and 7.
CAP_SYS_ADMIN
. Avoid granting this catch-all to any program.CAP_SETUID
. Ping should not require this.CAP_NET_RAW
. Ping should require this.Let's grant it that capability and see whether it succeeds.
setfcaps -c cap_net_raw=p -e /bin/ping (become non root user) ping google.com |
As we expected, ping succeeded.
Complications
Existing software is often written to be as secure as possible with few changes across many UNIX variants. On top of this, distributions sometimes apply their own patches, which can make it impossible to replace the root setuid bit with file capabilities in some situations.
An example of such a program on Fedora is at
. The at
program allows users to schedule jobs for execution at a later time. For instance, a cheap way to get a pop-up reminder to dial into a meeting at 2 p.m. would be:
echo "xterm -display :0.0 -e / /"echo Call customer 555-5555; echo ^V^G; sleep 10m/" " | / at 14:00 |
The at
program is available for all UNIX systems and can be used by any user. Users share a common job spool under /var/spool. Security is therefore of the utmost importance, but it also is coded to work across many systems and so does not make use of system-specific security mechanisms like capabilities. Nevertheless, it attempts to reduce privilege through the use of setuid(2)
. On top of this, the Fedora package adds patches of use to PAM modules.
The quickest way to check whether at
could be made to run by a non-root user without being setuid root is to remove the setuid bit and then grant it all capabilities.
chmod u-s /usr/bin/at setfcaps -c all=p -e /usr/bin/at su - (non root user) /usr/bin/at |
By specifying -c all=p
, we asked for a fully populated permitted, or forced, capability set on /usr/bin/at
. So any user running this program will do so with all of root's privileges. But on a Fedora 7, running /usr/bin/at
will now result in:
You do not have permission to run at. |
The reason is evident if you download and study the source code, but the details are not helpful for this exercise. While certainly it is possible to change the source code to make at
usable with file capabilities, the setuid bit cannot be substituted by simply assigning file capabilities on Fedora.
File capability details
So far, we have been using a very specific format for the capabilities we assign to executables. For ping we used:
setfcaps -c cap_net_raw=p -e /bin/ping |
setfcaps
is a program that sets the target file's capabilities by setting an extended attribute named security.capability. The -c
flag is followed by a list of capabilities in a somewhat free-flowing format:
capability_list=capability_set(s) |
capability_set
can contain i
and p
, and capability_list
can contain any valid capabilities. The capability types represent inheritable and permitted sets, respectively, and separate capability lists can be specified for each set. The -e
or -d
flag dictates whether the capabilities in the permitted set are in the program's effective set on startup or not, respectively. If the capabilities are not in the program's effective set, then the program must be capability aware and must activate the bits in its effective set itself in order to make use of the capabilities.
Until now, we have asserted the desired capabilities in the permitted set but not the inheritable set. In fact, there are subtler and more powerful things we could do with capabilities. Recall Listing 1, repeated here:
pI' = pI pP' = fP | (fI & pI) pE' = pP' & fE |
The file inheritable set specifies which of the process's inheritable capabilities can be in the process's new permitted set. If onlycap_dac_override
is in the file inheritable set, then only that capability can be inherited into the process's new permitted set.
The file permitted set, also known as the "forced" set, is the set that is forced on in the new permitted set, regardless of whether it was in the task's inheritable set or not.
Finally, the file effective bit dictates whether the bits in the task's new permitted set should be in its new effective set; that is, whether the program should be able to actually exercise the capabilities without explicitly asking to using cap_set_proc(3)
.
Recall that the system makes a few changes for the root user when SECURE_NOROOT
is not set. In particular, the system pretends that on the file being executed, the inheritable (fI
), permitted (fP
), and effective (fE
) sets are fully populated. So the fI set on a binary is only useful for a non-root process with non-empty capability sets. In particular, for a program that has kept capabilities while becoming a non-root user, the above formulas will apply without such finagling. It is likely that SECURE_NOROOT
will become a per-process setting so that process trees can choose whether to use true capabilities or use a root-user-is-privileged model. But at the time of this writing, this is a system-wide setting that, for any practical system, is set such that the root user is always all-powerful by default.
To illustrate the interactions of these sets, let's assume that the administrator has used the following command to set file capabilities on /bin/some_program:
setfcaps -c cap_sys_admin=i,cap_dac_read_search=p -e / /bin/some_program |
If a non-root user runs this program while running with full capabilities, its inheritable set pI
is first masked against fI
so it is reduced to just cap_sys_admin
. Next, fP
is unioned with that set, so the interim result iscap_sys_admin+cap_dac_read_search
. This set becomes the task's new permitted set.
Finally, since the effective bit is on, the task's new effective set will contain both the bits that are in its new permitted set.
In contrast, if a completely unprivileged user runs this same program, his empty inheritable set is masked against fI
, resulting in the empty set. This is unioned with fP
, resulting in cap_dac_read_search
. This becomes the task's new permitted set. Finally, since the effective bit is on, the new permitted set is copied to the new effective set, resulting again incap_dac_read_search
.
In either case, if the file effective bit were not set, then the task would need to use cap_set_proc(3)
to copy any bits it wanted to use from its permitted set to its effective set.
Back to top
Summary and exercises
To summarize:
To illustrate what we've covered, let's experiment with the programs in Listings 5 and 6. In Listing 5, print_caps
simply prints out the capability sets with which it is running. In Listing 6, exec_as_nonroot_priv
is intended to be executed as the root user. It asks to keep its capabilities across the next setuid(2)
, becomes the non-root user specified as the first command-line argument, sets its capability sets to those indicated in the second command-line argument, and then executes the program specified as the third command-line argument.
#include <stdio.h> #include <stdlib.h> #include <sys/capability.h> int main(int argc, char *argv[]) { cap_t cap = cap_get_proc(); if (!cap) { perror("cap_get_proc"); exit(1); } printf("%s: running with caps %s/n", argv[0], cap_to_text(cap, NULL)); cap_free(cap); return 0; } |
#include <sys/prctl.h> #include <sys/capability.h> #include <sys/types.h> #include <unistd.h> #include <stdio.h> void printmycaps(void) { cap_t cap = cap_get_proc(); if (!cap) { perror("cap_get_proc"); return; } printf("%s/n", cap_to_text(cap, NULL)); cap_free(cap); } int main(int argc, char *argv[]) { cap_t cur; int ret; int newuid; if (argc<4) { printf("Usage: %s <uid> <capset>" "<program_to_run>/n", argv[0]); exit(1); } ret = prctl(PR_SET_KEEPCAPS, 1); if (ret) { perror("prctl"); return 1; } newuid = atoi(argv[1]); printf("Capabilities before setuid: "); printmycaps(); ret = setresuid(newuid, newuid, newuid); if (ret) { perror("setresuid"); return 1; } printf("Capabilities after setuid, before capset: "); printmycaps(); cur = cap_from_text(argv[2]); ret = cap_set_proc(cur); if (ret) { perror("cap_set_proc"); return 1; } printf("Capabilities after capset: "); cap_free(cur); printmycaps(); ret = execl(argv[3], argv[3], NULL); if (ret) perror("exec"); } |
Let's use these programs to verify the effect of the inheritable and permitted file capabilities. We will do this by placing file capabilities on print_caps
, then executing print_caps
with initial process capability sets carefully set up usingexec_as_nonroot_priv
. First, set some capabilities just in print_caps
's permitted set:
gcc -o print_caps print_caps.c -lcap setfcaps -c cap_dac_override=p -d print_caps |
Now execute print_caps
as a non-root user:
su - (username) ./print_caps |
Next, as root, execute print_caps
through exec_as_nonroot_priv
:
./exec_as_nonroot_priv 1000 cap_dac_override=eip ./print_caps |
In the either case, print_caps
ran with cap_dac_override=p
. Note that the effective set is empty. That means thatprint_caps
would have to use cap_set_proc(3)
before it would actually be able to make use of the cap_dac_override
capability. To change that, use the -e
flag to setflags
to set the effective bit.
setfcaps -c cap_dac_override=p -e print_caps |
print_caps
has an empty fI
so none of the process' pI
is pulled into pP'
. The single bit in pP'
came from the file forced set,fP
.
A more interesting test, though, is to test the effect of the inheritable file capability and run print_caps
again both as a non-root user and through the exec_as_nonroot_priv
program:
setfcaps -c cap_dac_override=i -e print_caps su - (nonroot_user) ./print_caps exit ./exec_as_nonroot_priv 1000 cap_dac_override=eip ./print_caps |
This time, the non-root user has an empty capability set, while the process started as a root user has cap_dac_override
in its permitted and effective sets.
Run print_caps
one more time, this time simply as the root user without going through exec_as_nonroot_priv
. Note that the capability set is full. The root user always receives a full capability set after executing a program, regardless of file capabilities. The exec_as_nonroot_priv
does not run print_caps
as the root user. Rather it uses the privileges of the root user to set up a non-root process with some inheritable capabilities.
Back to top
Conclusion
Now you know how to determine which capabilities are needed by a program, how to set the capabilities, and how to do some other interesting things with file capabilities.
Always handle capabilities with care; they are still dangerous pieces of root privilege. On the other hand, experience with the sendmail capabilities bug (see Resources for a link) shows that providing too few capabilities can be dangerous as well. Nevertheless, file capabilities applied judiciously to system binaries in place of making them setuid root can help protect your systems.
Resources
Learn
setuid()
:
exec()
family of functions replaces the current process image with a new process image. prctl()
, operations on a process, is called with a first argument describing what to do (with values defined in <linux/prctl.h>) and further parameters with a significance depending on the first one. setuid()
sets the effective user ID of the current process. If the effective UID of the caller is root, the real UID and saved set-user-ID are also set. Under Linux, it is implemented like the POSIX version with the_POSIX_SAVED_IDS
feature. This allows a set-user-ID (other than root) program to drop all of its user privileges, do some un-privileged work, and then re-engage the original effective user ID in a secure manner. cap_set_proc()
sets the values for all capability flags for all capabilities with the capability state identified by cap_p
. The new capability state of the process will be completely determined by the contents of cap_p
upon successful return from this function. If any flag in cap_p
is set for any capability not currently permitted for the calling process, the function will fail, and the capability state of the process will remain unchanged. Get products and technologies
Discuss
About the author
Serge Hallyn is a part of IBM's Linux Technology Center, focusing on Linux kernel and security. He obtained his Ph.D. in computer science from the College of William and Mary. He has written and contributed to several security modules. He currently focuses on adding support for virtual server functionality, application checkpoint/restart, and POSIX file capabilities.
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
other resource
http://www.symantec.com/connect/articles/introduction-linux-capabilities-and-acls
http://linux.die.net/man/7/capabilities