我们从进程出发来剖析cgroups相关数据结构之间的关系
在Linux中,管理进程的数据结构是task_struct,其中与cgroups有关的:
#ifdef CONFIG_CGROUPS /* Control Group info protected by css_set_lock */ struct css_set __rcu *cgroups; /* cg_list protected by css_set_lock and tsk->alloc_lock */ struct list_head cg_list; #endif
struct css_set { /* Reference count */ atomic_t refcount; /* * List running through all cgroup groups in the same hash * slot. Protected by css_set_lock */ struct hlist_node hlist; /* * List running through all tasks using this cgroup * group. Protected by css_set_lock */ struct list_head tasks; /* * List of cg_cgroup_link objects on link chains from * cgroups referenced from this css_set. Protected by * css_set_lock */ struct list_head cg_links; /* * Set of subsystem states, one for each subsystem. This array * is immutable after creation apart from the init_css_set * during subsystem registration (at boot time) and modular subsystem * loading/unloading. */ struct cgroup_subsys_state *subsys[CGROUP_SUBSYS_COUNT]; /* For RCU-protected deletion */ struct rcu_head rcu_head; };
hlist是嵌入的hlist_node,用于把所有css_set组织成一个hash表,这样内核可以快速查找特定的css_set。
tasks指向所有练到此css_set的进程练成的链表。
cg_links指向一个由struct cg_cgroup_link连成的链表
subsys是一个指针数组,存储一组指向cgroup_subsys_state的指针。一个cgroup_subsys_state就是进程与一个特定子系统相关的信息。通过这个指针数组,进程就可以获得相应的cgroups控制信息了。
下面看下cgroup_subsys_state的结构:
struct cgroup_subsys_state { /* * The cgroup that this subsystem is attached to. Useful * for subsystems that want to know about the cgroup * hierarchy structure */ struct cgroup *cgroup; /* * State maintained by the cgroup system to allow subsystems * to be "busy". Should be accessed via css_get(), * css_tryget() and and css_put(). */ atomic_t refcnt; unsigned long flags; /* ID for this css, if possible */ struct css_id __rcu *id; };
下面看下cgroup的结构:
struct cgroup { unsigned long flags; /* "unsigned long" so bitops work */ /* * count users of this cgroup. >0 means busy, but doesn't * necessarily indicate the number of tasks in the cgroup */ atomic_t count; /* * We link our 'sibling' struct into our parent's 'children'. * Our children link their 'sibling' into our 'children'. */ struct list_head sibling; /* my parent's children */ struct list_head children; /* my children */ struct cgroup *parent; /* my parent */ struct dentry __rcu *dentry; /* cgroup fs entry, RCU protected */ /* Private pointers for each registered subsystem */ struct cgroup_subsys_state *subsys[CGROUP_SUBSYS_COUNT]; struct cgroupfs_root *root; struct cgroup *top_cgroup; /* * List of cg_cgroup_links pointing at css_sets with * tasks in this cgroup. Protected by css_set_lock */ struct list_head css_sets; /* * Linked list running through all cgroups that can * potentially be reaped by the release agent. Protected by * release_list_lock */ struct list_head release_list; /* * list of pidlists, up to two for each namespace (one for procs, one * for tasks); created on demand. */ struct list_head pidlists; struct mutex pidlist_mutex; /* For RCU-protected deletion */ struct rcu_head rcu_head; /* List of events which userspace want to receive */ struct list_head event_list; spinlock_t event_list_lock; };
subsys是一个指针数组,存储一组指向cgroup_subsys_state的指针。这指针指向了此cgroup跟各个子系统相关的信息,这个跟css_set中的道理是一样的。
root指向了一个cgroupfs_root的结构,就是cgroup所在的层级对应的结构体。这样一来,之前的几个cgroups概念就全部联系起来了。
top_cgroup指向了所在层级的根cgroup,也就是创建层级时自动创建的那个cgroup
css_set指向一个由struct cg_cgroup_link连成的链表,跟css_set中的cg_links一样。
下面分析一下css_set和cgroup之间的关系。先看下cg_cgroup_linnk的结构:
struct cg_cgroup_link { /* * List running through cg_cgroup_links associated with a * cgroup, anchored on cgroup->css_sets */ struct list_head cgrp_link_list; struct cgroup *cgrp; /* * List running through cg_cgroup_links pointing at a * single css_set object, anchored on css_set->cg_links */ struct list_head cg_link_list; struct css_set *cg; };
cg_link_list则连入到css_set->cg_links指向的链表,cg则指向此cg_cgroup_link相关的css_set。
设计原理:
因为cgroup和css_set是一个多对多的关系,必须添加一个中间结构来将两者联系起来,这跟数据库模式设计是一个道理。cg_cgroup_link中的cgrp和cg就是此结构体的联合主键,而cgrp_link_list和cg_link_list分别连入到cgroup和css_set相应的链表,使得能从cgroup或css_set都可以进行遍历查询。
一个进程对应css_set,一个css_set就存储了一组进程跟各个子系统相关的信息,但是这些信息由可能不是从一个cgroup那里获得的,因为一个进程可以同时属于几个cgroup,只要这些cgroup不在同一个层级。举个例子:我们创建一个层级A,A上面附加了cpu和memory两个子系统,进程B属于A的根cgroup;然后我们再创建一个层级C,C上面附加了ns和blkio两个子系统,进程B同样属于C的根cgroup;那么进程B对应的cpu和memory的信息是从A的跟cgroups获得的,ns和blkio信息则是从C的跟cgroup获得的。因此,一个css_set存储的cgroup_subsys_state可以对应多个cgroup。另一方面,cgroup也存储了一组cgroup_subsys_state,这一组cgroup_subsys_state则是cgroup从所在的层级附加的子系统获得的。一个cgroup钟可以有多个进程,而这些进程的css_set不一定都相同,因为有些进程可能还加入了其他的cgroup。但是同一个cgroup中的进程与该cgroup关联的cgroup_subsys_state都受到该cgroup的管理,所以一个cgroup也可以对应多个css_set。
从前面的分析,我们可以看出从task到cgroup时很容易定位的,但是从cgroup获取此cgroup的所有的task就必须通过这个结构了。每个进程都会指向一个css_set,而与这个css_set关联的所有进程都会链入到css->tasks链表。而cgroup又通过一个中间结构cg_cgroup_link来寻找所有与之关联的css_set,从而可以得到与cgroup关联的所有进程。
最后看下层级和子系统对应的结构体,层级对应的结构体是cgroupfs_root:
struct cgroupfs_root { struct super_block *sb; /* * The bitmask of subsystems intended to be attached to this * hierarchy */ unsigned long subsys_bits; /* Unique id for this hierarchy. */ int hierarchy_id; /* The bitmask of subsystems currently attached to this hierarchy */ unsigned long actual_subsys_bits; /* A list running through the attached subsystems */ struct list_head subsys_list; /* The root cgroup for this hierarchy */ struct cgroup top_cgroup; /* Tracks how many cgroups are currently defined in hierarchy.*/ int number_of_cgroups; /* A list running through the active hierarchies */ struct list_head root_list; /* Hierarchy-specific flags */ unsigned long flags; /* The path to use for release notifications. */ char release_agent_path[PATH_MAX]; /* The name for this hierarchy - may be empty */ char name[MAX_CGROUP_ROOT_NAMELEN]; };
subsys_bits和actual_subsys_bits分别指向将要附加到层级的子系统和现在实际附加到层级的子系统,在子系统附加到层级时使用
hierarchy_id是该层级唯一的id
top_cgroup指向该层级的根cgroup
number_of_cgroups记录该层级cgroup的个数
root_list是一个嵌入的list_head,用于将系统所有的层级连成链表
子系统对应的结构体是cgroup_subsys:
struct cgroup_subsys { struct cgroup_subsys_state *(*create)(struct cgroup *cgrp); int (*pre_destroy)(struct cgroup *cgrp); void (*destroy)(struct cgroup *cgrp); int (*can_attach)(struct cgroup *cgrp, struct cgroup_taskset *tset); void (*cancel_attach)(struct cgroup *cgrp, struct cgroup_taskset *tset); void (*attach)(struct cgroup *cgrp, struct cgroup_taskset *tset); void (*fork)(struct task_struct *task); void (*exit)(struct cgroup *cgrp, struct cgroup *old_cgrp, struct task_struct *task); int (*populate)(struct cgroup_subsys *ss, struct cgroup *cgrp); void (*post_clone)(struct cgroup *cgrp); void (*bind)(struct cgroup *root); int subsys_id; int active; int disabled; int early_init; /* * True if this subsys uses ID. ID is not available before cgroup_init() * (not available in early_init time.) */ bool use_id; #define MAX_CGROUP_TYPE_NAMELEN 32 const char *name; /* * Protects sibling/children links of cgroups in this * hierarchy, plus protects which hierarchy (or none) the * subsystem is a part of (i.e. root/sibling). To avoid * potential deadlocks, the following operations should not be * undertaken while holding any hierarchy_mutex: * * - allocating memory * - initiating hotplug events */ struct mutex hierarchy_mutex; struct lock_class_key subsys_key; /* * Link to parent, and list entry in parent's children. * Protected by this->hierarchy_mutex and cgroup_lock() */ struct cgroupfs_root *root; struct list_head sibling; /* used when use_id == true */ struct idr idr; spinlock_t id_lock; /* should be defined only by modular subsystems */ struct module *module; };