General Presentation

CGroups stand for Control Groups. They were introduced into the kernel by Google in 2006 to restrict resources used by a process. All the resources a process can use have their own resource controller or CGroup subsystem.

Here is the list of the available resource controllers:

  • blkio: sets limits on input/output access to and from      block devices (see BlockIOWeight);

  • cpu: uses the CPU scheduler to provide CGroup      tasks an access to the CPU. It is mounted together with the cpuacct      controller on the same mount (see CPUShares);

  • cpuacct: creates automatic reports on CPU resources used      by tasks in a CGroup. It is mounted together with the cpu      controller on the same mount (see CPUShares);

  • cpuset: assigns individual CPUs (on a multicore system)      and memory nodes to tasks in a CGroup;

  • devices: allows or denies access to devices for tasks in      a CGroup;

  • freezer: suspends or resumes tasks in a CGroup;

  • memory: sets limits on memory use by tasks in a CGroup,      and generates automatic reports on memory resources used by those tasks      (see MemoryLimit);

  • net_cls: tags network packets with a class identifier      (classid) that allows the Linux traffic controller (the tc command)      to identify packets originating from a particular CGroup task;

  • perf_event: enables monitoring CGroups with the perf      tool;

  • hugetlb: allows to use virtual memory pages of large      sizes, and to enforce resource limits on these pages.

CGroups were already available in RHEL 6. However, with the arrival of Systemd in RHEL 7, many things have changed.

Systemd Contribution

Systemd organizes processes in control groups. For example, all the processes started by an apache webserver will be in the same control group, CGI scripts included. This makes stopping an apache webserver much easier. This also moves the resource management settings from the process level to the application level by binding the system of CGroup hierarchies with the Systemd unit tree.

The Systemd unit tree is made up of several parts:

  • at the top,      there is the root slice called -.slice,

  • below, there      are the system.slice (the default place for all system services),      the user.slice (the default place for all user sessions) and the machine.slice      (the default place for all virtual machines and Linux containers),

  • still below      there are scopes (group of externally created processes started via      fork) and services (group of processes created through a      unit file).

Note: If only system services run on a server, they get 100% of the available resources. If users connect to the server, they will get 100% of the available resources minus what the system services use. If both users and system services request 100% of the resources each, they will only get 50%. If there are system services, users and virtual machines and they all request 100% of the resources, they will only get 33% each.
Any kind of slice can get 100% of the resources if nobody else wants them. But if resources are not available as much as one slice would, some limits occur.

For example, to get the full hierarchy of control groups, type:

# systemd-cgls

├─1 /usr/lib/systemd/systemd --switched-root --system --deserialize 24

├─user.slice

│ └─user-0.slice

│   ├─session-56.scope

│   │ ├─19679 sshd: root@pts/1   

│   │ ├─19683 -bash

│   │ ├─19714 systemd-cgls

│   │ └─19715 less

│   └─session-40.scope

│     ├─19370 sshd: root@pts/0   

│     └─19374 -bash

└─system.slice

  ├─httpd.service

  │ ├─2577 /usr/sbin/httpd -DFOREGROUND

  │ ├─2578 /usr/sbin/httpd -DFOREGROUND

  │ └─2579 /usr/sbin/httpd -DFOREGROUND

  ├─polkit.service

  │ └─730 /usr/lib/polkit-1/polkitd --no-debug

  ├─systemd-udevd.service

  │ └─455 /usr/lib/systemd/systemd-udevd

  ├─lvm2-lvmetad.service

  │ └─450 /usr/sbin/lvmetad -f

  ├─systemd-journald.service

  │ └─449 /usr/lib/systemd/systemd-journald

  ├─dbus.service

  │ └─611 /bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --sy

  ├─systemd-logind.service

  │ └─604 /usr/lib/systemd/systemd-logind

  ├─chronyd.service

  │ └─613 /usr/sbin/chronyd -u chrony

  ├─crond.service

  │ └─621 /usr/sbin/crond -n

  ├─postfix.service

  │ ├─ 1349 /usr/libexec/postfix/master -w

  │ ├─ 1358 qmgr -l -t unix -u

  │ └─19596 pickup -l -t unix -u

  ├─rsyslog.service

  │ └─589 /usr/sbin/rsyslogd -n

  ├─sshd.service

  │ └─1068 /usr/sbin/sshd -D

  ├─tuned.service

  │ └─583 /usr/bin/python -Es /usr/sbin/tuned -l -P

  ├─firewalld.service

  │ └─580 /usr/bin/python -Es /usr/sbin/firewalld --nofork --nopid

  ├─NetworkManager.service

  │ └─698 /usr/sbin/NetworkManager --no-daemon

  ├─system-getty.slice

  │ └─[email protected]

  │   └─631 /sbin/agetty --noclear tty1

  └─system-serial\x2dgetty.slice

    └─[email protected]

      └─630 /sbin/agetty --keep-baud 115200 38400 9600 ttyS0

To kill all the processes associated with an apache server (CGI scripts included), type:

# systemctl kill httpd

Note: Add the -s option to specify the signal to send (SIGTERM, SIGINT or SIGSTOP; by default SIGTERM).

To get the list of control group ordered by CPU, memory and disk I/O load, type:

# systemd-cgtop

Path                               Tasks %CPU Memory Input/s Output/s

/                                    213  3.9 829.7M       -        -

/system.slice                          1    -      -       -        -

/system.slice/sshd.service             1    -      -       -        -

/system.slice/ModemManager.service     1    -      -       -        -

Accounting Activation

If you want the sshd service to display the amount of memory currently used through the systemd-cgtop command, you need to activate accounting by creating the /etc/systemd/system/sshd.service.d directory:

# mkdir /etc/systemd/system/sshd.service.d

Then, you need to create the /etc/systemd/system/sshd.service.d/accounting.conf file and paste the following lines into it:

[Service]

MemoryAccounting=true

Note: Other options exist like CPUAccounting, BlockIOAccounting or TasksAccounting (since RHEL 7.4).

Finally, you need to type:

# systemctl daemon-reload; systemctl restart sshd

Systemd Resource Controllers

Through Systemd, several resources can be restricted:

  • CPUShares: by default at 1024; this requires CPUAccounting=true      to be set before,

  • MemoryLimit: by default without limit, value expressed in Megabytes      or Gigabytes; this requires MemoryAccounting=true to be set      before,

  • BlockIOWeight, BlockIODeviceWeight: by default without      limit, value between 10 and 1000 (requires CFQ IO      elevator); this needs BlockIOAccounting=true to be set before.

  • BlockIOReadBandwidth, BlockIOWriteBandwidth: by default      without limit, value expressed in Megabytes or Gigabytes;      this requires BlockIOAccounting=true to be set before.

The RHEL 7.2 release brings three new resource management options:

  • StartupCPUShares and StartupBlockIOWeight: they      work like CPUShares and BlockIOWeight but only      apply during system startup; this requires CPUAccounting=true      to be set before,

  • CPUQuota: it restricts CPU time      to the specified percentage, even if the machine is otherwise      idle; this requires CPUAccounting=true to be set before.

The RHEL 7.4 release brings one new resource management option:

  • TasksMax=10 defines the maximum number of tasks the unit can      create (here 10); this requires TasksAccounting=true to be      set before.

To put resource limits on a service (here 500 CPUShares), type:

# systemctl set-property httpd CPUShares=500

# systemctl daemon-reload

Note1: The change is written into the service unit file. Use the –runtime option to avoid this behaviour.
Note2: By default, each service owns 1024 CPUShares. Nothing prevents you from giving a value smaller or bigger.

To get the current CPUShares service value, type:

# systemctl show -p CPUShares httpd

CPUShares=500

Or:

# systemctl show httpd | grep CPUShares

CPUShares=500

Note: Each time a resource limit is set on a service, a directory of the same name with the .d suffix is created in /etc/systemd/system. For example, in the previous case, a directory named /etc/systemd/system/httpd.service.d is created with a file called 90-CPUShares.conf in it and the following content:

[Service]

CPUShares=500

Note: The newly created directory (here /etc/systemd/system/httpd.service.d) can also be used to customize the service configuration file.

Also, if you need to use RT (Real-Time) services, be ready to apply additional RT configurations.

The libcgroup-tools package gives access to some useful tools for manipulating control groups.

Some Other Examples

As previously seen, system processes get 1024 CPUShares, user processes 1024 CPUShares and virtual machines 1024 CPUShares, which means 33% of CPU each by default (you will only see it if each category executes some tasks).
To allocate 70% of CPU to the system processes, 20% of CPU to the user processes and 10% of CPU to the virtual machines, type:

# systemctl set-property system.slice CPUShares=7168

# systemctl set-property user.slice CPUShares=2048

# systemctl set-property machine.slice CPUShares=1024

Note: Rebooting might be necessary to see the changes.

To restrict the user with the uid 1000 to use less than 20% of cpu, type:

# systemctl set-property user-1000.slice CPUQuota=20%

To reduce the memory available for the same user to 1GB, type:

# systemctl set-property user-1000.slice MemoryLimit=1024M

To limit the mariadb service to write below 2MB/s onto the /dev/vdb partition, type:

# systemctl set-property mariadb.service BlockIOWriteBandwidth="/dev/vdb 2M"

Case Study

To better understand CGroups, let’s take an example. You want to run a website but you’ve got only one server.
You plan to use the classical LAMP stack (Linux, here Centos 7, Apache, MariaDB and PHP).

Your server’s got 4Gigabytes of memory and you want to allocate resources as follows:

  • Apache service (here httpd.service): 40%      of CPU, 500M of memory,

  • PHP service (here php-fpm.service): 30%      of CPU, 1G of memory,

  • MariaDB service (here mariadb.service): 30%      of CPU, 1G of memory.

You leave 1G of memory for the other processes (system, etc).

Note1: The values given are only for the sake of the discussion.
Note2: If you don’t configure CGroups, everything will work like in RHEL 6: all the processes will share the server power and the memory as they need.

As all your LAMP services are started from a Systemd unit file, they will be added in the system.slice.

Here is the configuration to set up with the systemctl set-property command:

  • Apache service: CPUShares=4096 (4 x      1024); MemoryLimit=500M,

  • PHP service: CPUShares=3072 (3 x      1024); MemoryLimit=1G,

  • MariaDB service: CPUShares=3072 (3 x      1024); MemoryLimit=1G,

Note1: The Apache service will get 4096/(4096+3072+3072) CPUShares, the PHP service will get 3072/(4096+3072+3072) CPUShares, etc.
Note2: There are some other services in the system.slice (crond, postfix, chronyd, etc). But, as they are not very hungry, they will not consume their default allocated CPU resources (1024) and will not change anything to the situation. However, even though the Apache+MariaDB+PHP services use all their CPU resources, because the way it works, there will be still some resources for the other services.

Caution: Once you set up CPUShares CGroup restriction on one service in the system.slice, all the services there get CPUShares CGroup activated: even though you don’t specify anything, all new service started will be restricted to 1024 CPUShares by default. It is not possible to CPU-restrict some services and let the others without restriction. For a detailed explanation of the mechanism, see All control groups belong to us! video below in the Additional Resources section.