Without Cgroups, there would be no Docker

What are Cgroups?

Cgroups is the abbreviation of control groups. It is a mechanism provided by the Linux kernel that can limit, record and isolate the physical resources (such as cpu,memory,IO, etc.) used by process groups. Originally proposed by google engineers, it was later integrated into the Linux kernel.

What can Cgroups do?

The initial goal of cgroups is to provide a unified framework for resource management, which not only integrates the required cpuset and other subsystems, but also provides interfaces for the development of new subsystems in the future. Cgroups are now applicable to a variety of application scenarios, from resource control of a single process to OS Level Virtualization.

cgroups subsystem

Its implementation principle is to set the parameters of various Linux subsystem s, and then bind the process with these subsystems.

Linux subsystem has the following types:

blkio
cpu
cpuacct counts the CPU usage of processes in cgroup
cpuset
devices
freezer user suspends and resumes processes from the group
memeory controls the memory usage of processes in cgroup
net_cls
net_prio
ns

By installing the cgroup tool

$ apt-get install cgroup-tools
$ lssubsys -a
cpuset
cpu,cpuacct
blkio
memory
devices
freezer
net_cls,net_prio
perf_event
hugetlb
pids
rdma

cgroups hierarchy

The function of hierarchy is to organize a group of cgroups into a tree structure so that cgroups can inherit

A cgroup1 limits the CPU utilization frequency of the processes (P1, P2, P3) under it. If you want to limit the memory of process P2, you can create cgroup2 under cgroup1 to inherit from cgroup1, which can limit the CPU utilization, and set the memory limit without affecting other processes.

The kernel uses cgroups structure to express the resource constraints on one or several cgroups subsystems. It is organized in the form of a tree, which is called hierarchy

cgroups and processes

Relationship among hierarchy, subsystem and cgroup process group
hierarchy only implements the inheritance relationship, and the real resource restriction still depends on the subsystem
By attaching the subsystem to the hierarchy,
Add the process group to the hierarchy (task) to realize resource restriction

As can be seen from this figure:

A subsystem can only be attached to a hierarchy
A hierarchy can attach multiple subsystem s
A process can be a member of multiple cgroups, but these cgroups must be in different hierarchies.
When a process fork s out a child process, the child process is in the same cgroup as the parent process, or it can be moved to other cgroups as needed.

cgroups file system

The underlying implementation of cgroups is hidden by the VFS (Virtual File System) of the Linux kernel, exposing a unified file system API excuse to the user state. Let's experience the use of this file system:

First, create and mount a hierarchy (cgroup tree)

$ mkdir cgroup-test
$ sudo mount -t cgroup -o none,name=cgroup-test cgrout-test ./cgroup-test
$ ls ./cgrpup-test
cgroup.clone_children  cgroup.sane_behavior  release_agent
cgroup.procs           notify_on_release     tasks

These files are the configuration items of the cgroup root node in the hierarchy

cgroup.clone_children will be read by the cpuset subsystem. If it is 1, the child CGroup will inherit the cpuset configuration of the parent CGroup.

notify_on_release and release_agent is used to manage some operations when the last process exits

tasks identifies the process ID under the cgroup and associates the process members of the cgroup with the hierarchy

2. Create two more child hierarchies. Create two child cgroups expanded from the cgroup root node of the newly created hierarchy

$ cd cgroup-test
$ sudo mkdir cgroup-1
$ sudo mkdir cgroup-2
$ tree
.
├── cgroup-1
│   ├── cgroup.clone_children
│   ├── cgroup.procs
│   ├── notify_on_release
│   └── tasks
├── cgroup-2
│   ├── cgroup.clone_children
│   ├── cgroup.procs
│   ├── notify_on_release
│   └── tasks
├── cgroup.clone_children
├── cgroup.procs
├── cgroup.sane_behavior
├── notify_on_release
├── release_agent
└── tasks

2 directories, 14 files

You can see that when creating a folder in a cgroup directory, the Kernel will mark the folder as the child cgroup of the cgroup, and they will inherit the properties of the parent cgroup.

Adding and moving processes to cgroup
In the hierarchy of a cgroup, a process can only exist on one cgroup node. All processes of the system exist on the root node by default. You can move the process to other cgroup nodes. You only need to write the process ID to the tasks file of the cgroup node.

# cgroup-test
$ ehco $$
3444
$ cat /proc/3444/cgroup 
13:name=cgroup-test:/
12:cpuset:/
11:rdma:/
10:devices:/user.slice
9:perf_event:/
8:net_cls,net_prio:/
7:pids:/user.slice/user-1000.slice/user@1000.service
6:memory:/user.slice/user-1000.slice/user@1000.service
...

You can see that the process of the current terminal is under the root cgroup. Now we move it to the child cgroup

$ cd cgroup-1
$ sudo sh -c "echo $$ >> tasks"
$ cat /proc/3444/cgroup
13:name=cgroup-test:/cgroup-1
12:cpuset:/
11:rdma:/
10:devices:/user.slice
9:perf_event:/
8:net_cls,net_prio:/
7:pids:/user.slice/user-1000.slice/user@1000.service
6:memory:/user.slice/user-1000.slice/user@1000.service
...

You can see that the CGroup to which the terminal process belongs has changed to cgroup-1. Look at the tasks of the parent CGroup. There is no ID of the terminal process

$ cd cgroup-test
$ cat tasks | grep "3444"
# Return empty

Restrict the resources of processes in cgroup through subsystem.

By default, the operating system has created a default hierarchy for each subsystem in the sys/fs/cgroup / directory

$ ls /sys/fs/cgroup
blkio    cpu,cpuacct  freezer  net_cls           perf_event  systemd
cpu      cpuset       hugetlb  net_cls,net_prio  pids        unified
cpuacct  devices      memory   net_prio          rdma

You can see that the hierarchy of the memory subsystem also creates a cgroup in it

$ cd /sys/fs/cgroup/memory
$ sudo mkdir test-limit-memory && cd test-limit-memorysudo
# Set the maximum memory usage to 100MB
$ sudo sh -c "echo "100m" > memory.limit_in_bytes"sudo sh -c "echo $$ > tasks"
sudo sh -c "echo $$ > tasks"
$ sudo sh -c "echo $$ > tasks"
# Running stress that occupies 200MB of memory often
$ stress --vm-bytes 200m --vm-keep -m 1

You can compare the remaining memory before and after running, which is only reduced by 100MB

# Before operation
$ top
top - 12:04:12 up  6:45,  1 user,  load average: 1.87, 1.29, 1.06
 task: 348 total,   1 running, 346 sleeping,   0 stopped,   1 zombie
%Cpu(s):  1.3 us,  0.9 sy,  0.0 ni, 97.7 id,  0.0 wa,  0.0 hi,  0.1 si,  0.0 st
MiB Mem :   5973.4 total,    210.8 free,   2820.9 used,   2941.8 buff/cache
MiB Swap:    923.3 total,    921.9 free,      1.3 used.   2746.3 avail Mem 

# After operation
$ top
top - 12:04:57 up  6:45,  1 user,  load average: 2.25, 1.44, 1.12
 task: 351 total,   3 running, 347 sleeping,   0 stopped,   1 zombie
%Cpu(s): 34.3 us, 32.8 sy,  0.0 ni, 21.1 id,  4.9 wa,  0.0 hi,  6.9 si,  0.0 st
MiB Mem :   5973.4 total,    118.6 free,   2956.7 used,   2898.1 buff/cache
MiB Swap:    923.3 total,    817.7 free,    105.5 used.   2604.5 avail Mem

This indicates that the cgroup restriction is in effect

How to restrict cgroup in docker

First run a restricted memory container

$ sudo docker pull redis:4
$ sudo docker run -tid -m 100m redis:4
d79f22eb11d22c56a90f88e0aeb3cfda7cbe9639e2ab0e8532003a695e375e8d

Check the cgroup bound to the original memory subsystem, and you will see that there are more cgroups and dockers in it

$ ls /sys/fs/cgroup/memory
... docker
...
$ ls /sys/fs/cgroup/memory/docker
cgroup.clone_children                                             memory.max_usage_in_bytes
cgroup.event_control                                              memory.memsw.failcnt
cgroup.procs                                                      memory.memsw.limit_in_bytes
d79f22eb11d22c56a90f88e0aeb3cfda7cbe9639e2ab0e8532003a695e375e8d  memory.memsw.max_usage_in_bytes
memory.failcnt                                                    memory.memsw.usage_in_bytes
memory.force_empty                                                memory.move_charge_at_immigrate
memory.kmem.failcnt                                               memory.numa_stat
memory.kmem.limit_in_bytes                                        memory.oom_control
memory.kmem.max_usage_in_bytes                                    memory.pressure_level
memory.kmem.slabinfo                                              memory.soft_limit_in_bytes
memory.kmem.tcp.failcnt                                           memory.stat
memory.kmem.tcp.limit_in_bytes                                    memory.swappiness
memory.kmem.tcp.max_usage_in_bytes                                memory.usage_in_bytes
memory.kmem.tcp.usage_in_bytes                                    memory.use_hierarchy
memory.kmem.usage_in_bytes                                        notify_on_release
memory.limit_in_bytes                                             tasks

You can see that the d79f22eb11d22c56a90f88e0aeb3cfda7cbe9639e2ab0e8532003a695e375e8dcgroup in the docker CGroup is just us
The container ID just created. Let's take a look inside

$ cd /sys/fs/cgroup/memory/docker/d79f22eb11d22c56a90f88e0aeb3cfda7cbe9639e2ab0e8532003a695e375e8d
$ cat memory.limit_in_bytes
104857600cat
# Exactly 100MB

summary

This paper describes the principle of cgroups, which is organized and related through three concepts (cgroup, subsystem and hierarchy)
3-tier structure, associate the process in cgroup, then associate cgroup with hierarchy, and then associate subsystem with hierarchy, so as to achieve a certain goal on the basis of limiting process resources
Reuse capability.

This paper describes the specific implementation of docker. When using docker, you can also know how to limit the resources used by containers.

Posted by karq on Sat, 06 Nov 2021 07:58:30 -0700

Programmer Group