cgroup
有什么用
Control groups, usually referred to as cgroups, are a Linux kernel feature
which allow processes to be organized into hierarchical groups whose usage of various types of resources can then be limited and monitored.
本质上来说,cgroup
对资源的限制是通过内核附加在程序上的一系列钩子(hook)实现的,通过程序运行时对资源的调度触发相应的钩子以达到资源追踪和限制的目的
相关基本概念
关键词:【文件系统】
The kernel's cgroup interface is provided through a pseudo-filesystem called cgroupfs.
【内核】通过【cgroupfs】对用户暴露cgroup的能力,【cgroupfs】是cgroup的能力的interface,用户通过编辑cgroupfs下的文件就可以实现cgroup功能
(看起来,以文件系统为操作内核的interface,是linux系统的惯用手段。类似的,sysctl对内核参数的配置也是以procfs文件系统为接口,见sysctl)
几个概念
subsystem:
Subsystems(cpu, memory, etc) are also known as resource controllers (or simply, controllers). A subsystem represents a single resource, such as CPU time or memory,一个subsystem表示对一种资源的控制
Various subsystems have been implemented, making it possible to do things such as limiting the amount of CPU time and memory available to a cgroup, accounting for the CPU time used by a cgroup, and freezing and resuming execution of the processes in a cgroup.cgroup:
逻辑上的资源控制单位,以subsystem的一个目录作为具体承载。如在/sys/fs/cgroup/blkio/下新建test目录,/sys/fs/cgroup/blkio/test就是一个cgroup。目录下有若干文件,这些文件以配置的形式实现对资源的控制,代表了该cgroup的attribute项hierarchy:
若干subsystem所组成的树,树中每个节点都是一个 cgroup。一个 cgroup 又可以有多个子节点,子节点预设继承父节点的attributes;os中可以有多个 hierarchy,这时构成hierarchy森林
The cgroups for a controller are arranged in a hierarchy.
This hierarchy is defined by creating, removing, and renaming subdirectories within the cgroup filesystem. At each level of the hierarchy, attributes (e.g., limits) can be defined.
Grouping is implemented in the core cgroup kernel code, while resource tracking and limits are implemented in a set of per-resource-type subsystems (memory, CPU, and so on).
分组的逻辑和限制的逻辑是两处不同的实现,资源追踪和限制由subsystem实现
与之对应的是,hierarchy中的每个cgroup(directory)中存在一系列文件,包括一个tasks
文件和其他的资源控制文件;tasks
文件对应grouping logic,其他的资源控制文件对应resource tracking and limits logic
cgroup model
接下来进一步理解和可视化cgroup model
system processes are called tasks in cgroup terminology
cgroup model rule
- rule1. A single hierarchy can have one or more subsystems attached to it.
- rule2. Any single subsystem (such as cpu) cannot be attached to more than one hierarchy if one of those hierarchies has a different subsystem attached to it already.
- rule3. A task can not be a member of two different cgroup in the same hierarchy
- rule4. Any process(task) on the system which forks itself creates a child task. A child task automatically inherits the cgroup membership of its parent but can be moved to different cgroups as needed. Once forked, the parent and child processes are completely independent.
the picture of cgroup model:
-----------------|---------------
| | |
hierarchyA ... hierarchyN | --> cgroup(s) are arranged in hierarchy
------- -------
one or more subsystem(s) one or more subsystem(s)
note:
For any single hierarchy you create, each task on the system can be a member of exactly one cgroup in that hierarchy.
A single task may be in multiple cgroups, as long as each of those cgroups is in a different hierarchy.
As soon as a task becomes a member of a second cgroup in the same hierarchy, it is removed from the first cgroup in that hierarchy. At no time is a task ever in two different cgroups in the same hierarchy.
more for
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/resource_management_guide/sec-relationships_between_subsystems_hierarchies_control_groups_and_tasks
https://0xax.gitbooks.io/linux-insides/content/Cgroups/linux-cgroups-1.html
小结:
hierarchy由若干subsystem组成,通过文件系统以树形结构将各个cgroup(directory)组织起来
如何创建一个cgroup
To use a control group, we should create it at first. We may create a cgroup via two ways.
- The first way is to create subdirectory in any subsystem from
/sys/fs/cgroup
and add a pid of a task(i.e. process) to atasks
file which will be created automatically right after we create the subdirectory. 直接进入subsystem执行mkdir,这实际上是在subsystem所属的hierarchy建cgroup。mkdir之后会自动在被创建的目录下创建一系列文件,包括tasks
文件。往tasks
文件中写入pid,则对应进程受到该cgroup的限制 - 使用
cgcreate
(create new cgroup(s))
cgcreate -t xxx:xxx -a xxx:xxx -g cpu:test
-t
:
defines the name of the user and the group, which ownstasks
file of the defined control group. I.e. this user and members of this group have write access to the file. The default value is the same as has the parent cgroup.
-a:
defines the name of the user and the group which own the rest of the defined control group’s files(in other words, the resource control config file). These users are allowed to set subsystem parameters and create subgroups. The default value is the same as has the parent cgroup.
-g:
defines control groups to be added. controllers is a list of controllers. Character "*" can be used as a shortcut for "all mounted controllers". path is the relative path to control groups in the given controllers list. This option can be specified multiple times.
指定cgroup运行程序
cgexec -g *:test1 ls
runs command ls in control group test1 in all mounted controllers.
cgexec -g cpu,memory:test1 ls -l
runs command ls -l in control group test1 in controllers cpu and memory.
cgexec -g cpu,memory:test1 -g swap:test2 ls -l
runs command ls -l in control group test1 in controllers cpu and memory and control group test2 in controller swap.
实践
查看cgroup默认的挂载点
[root@VM-165-116-centos ~]# mount | grep cgroup
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
[root@VM-165-116-centos ~]#
- 首行的
type tmpfs
说明 /sys/fs/cgroup 目录下的文件都是存在于内存中的临时文件 - /sys/fs/cgroup 目录下是各个子系统的根目录(除systemd外)
查看os支持的subsystem (cat /proc/cgroups
)
[root@VM-165-116-centos ~]# cat /proc/cgroups
#subsys_name hierarchy num_cgroups enabled
cpuset 4 6 1
cpu 5 109 1
cpuacct 5 109 1
blkio 10 91 1
memory 8 449 1
devices 6 91 1
freezer 11 6 1
net_cls 2 6 1
perf_event 9 6 1
net_prio 2 6 1
hugetlb 3 6 1
pids 7 117 1
[root@VM-165-116-centos ~]#
每一行是一个subsystem(aka controller)
一个subsystem承载了若干个cgroup(即目录),而若干个subsystem可以构成一个hierarchy,如cpu,cpuacct
num_cgroups指subsystem上承载的cgroup数量(目录数量)
以net_prio为例:
[root@VM-165-116-centos ~]# cd /sys/fs/cgroup/net_prio/
[root@VM-165-116-centos net_prio]# ls -lR | grep "^d" | wc -l
5
[root@VM-165-116-centos net_prio]#
/sys/fs/cgroup/net_prio/
本身 + 子目录5 = 6
进程的cgroup限制
cgroup是以配置的形式工作的
每个进程在/proc/pid/cgroup指定各个subsystem(aka controller)中的路径作为相对应resource的配置
[root@VM-165-116-centos ~]# cat /proc/1/cgroup
# hierarchyid : subsystem : specified cgroup (a dir of subsystem)
11:freezer:/
10:blkio:/init.scope
9:perf_event:/
8:memory:/init.scope
7:pids:/init.scope
6:devices:/init.scope
5:cpu,cpuacct:/init.scope
4:cpuset:/
3:hugetlb:/
2:net_cls,net_prio:/
1:name=systemd:/init.scope
[root@VM-165-116-centos ~]#
[root@VM-165-116-centos ~]# ls /sys/fs/cgroup/blkio/init.scope/
blkio.bfq.io_service_bytes blkio.reset_stats blkio.throttle.write_bps_device
blkio.bfq.io_service_bytes_recursive blkio.throttle.io_service_bytes blkio.throttle.write_iops_device
blkio.bfq.io_serviced blkio.throttle.io_service_bytes_recursive cgroup.clone_children
blkio.bfq.io_serviced_recursive blkio.throttle.io_serviced cgroup.procs
blkio.bfq.weight blkio.throttle.io_serviced_recursive notify_on_release
blkio.bfq.weight_device blkio.throttle.read_bps_device tasks
blkio.diskstats blkio.throttle.read_iops_device
上面的case中,pid为1的进程所属的cgroup分别是
- freezer subsystem的根目录
/
- blkio subsystem 的
/init.scope
目录 - ...
ulimit
是什么
ulimit
为shell内建指令(bash built-in command)
ulimit
provides control over the resources available to the shell and to subprocesses started by it, on systems that allow such control.
It sets the limits for the current shell, not for the user globally or for future shells; and these limits are inherited by processes started in that shell. That is, ulimit
equivalent for shell and subprocesses
The ulimit
shell command (executable program) is a wrapper around the setrlimit
system call (function provided by the kernel), and the underlying data structure which contains the limit information is called rlimit
(refers to resource limit).
getrlimit()
and setrlimit()
system calls get and set resources limits respectively, where the assicoated soft and hard limits are defined by the rlimit structure.
A hard limit can not be increased by a non-root user once it is set;
A soft limit may be increased up to the value of the hard limit.
If neither -H nor -S is specified, both the soft and hard limits are set.
Changes made by command ulimit
apply to the current process
If you need to make them permanent, you must edit /etc/security/limits.conf
For the ulimits to persists across reboots we need to set the ulimit values in the configuration file /etc/security/limits.conf. Settings in /etc/security/limits.conf take the following form:
#[domain] [type] [item] [value]
* - core [value]
* - data [value]
[user] soft nproc [value]
more for man limits.conf
修改/etc/security/limits.conf文件 重启服务器或重新登录即可生效
问题1
为什么要以树状的hierarchy来组织cgroup?
因为便于描述和实现不同cgroup之间的继承关系
问题2
what's the difference between ulimit and cgroup?
最大的不同应该是作用域,ulimit只对shell及其子进程生效,而cgroup可以灵活地让进程attach,实现per-process settings
其次,生效时机不同。每个shell被打开时,即受到ulimit的限制;ulimit配置后,对当前shell不起作用,而对后续打开的shell生效。而某cgroup建立后,只要将pid绑定到该cgroup的tasks,即可生效。生效时机不同实际上也是因为两者作用域的差异
再次,ulimit可管理的资源项少;cgroup可管理的资源项更加丰富,管理粒度也更细
ulimit has nothing to do with or anything related to throttling velocities. This is what cgroups kernel extension is capable of and is used for. To mention one of the other features it provides, we can say it can even allow hierarchies of settings to be defined.
一言以蔽之,ulimit简单粗暴,cgroup对资源的管理更加精细和灵活,可以做到per-process settings