Why do I need namespaces to build containers?

Keywords: Docker

1. What is a Namespace?

Namespace is a feature of the Linux kernel, which can isolate resources such as process ID, host name, user ID, file name, network and inter process communication in the same host system. Docker uses the namespace feature of Linux kernel to isolate the resources of each container, so as to ensure that only the resources of its own namespace can be accessed inside the container.

We create a container.

[root@master ~]# docker run -it busybox /bin/sh
/ # 

At this point, we will find some interesting things by executing the following ps command in the container.

/ # ps
PID   USER     TIME  COMMAND
    1 root      0:00 /bin/sh
    9 root      0:00 ps

As you can see, the / bin/sh that we first executed in Docker is the No. 1 process inside the container (PID=1), and there are only two processes running in the container. This means that the / bin/sh previously executed and the ps we just executed have been isolated by Docker in a completely different world from the host.

Originally, whenever we run a / bin/sh program on the host, the operating system will assign it a process number, such as PID = 1000. This number is the unique identification of the process. Just like the employee's work card, it can be roughly understood that / bin/sh is the 1000th employee in the company, and the first employee is naturally the person who leads the overall situation like the boss.

Now we run the / bin/sh program in a container through Docker. At this time, Docker will impose a "cover up" on the No. 1000 employee when he enters the job, so that he will never see the 999 employees in front of him, so he mistakenly thinks he is the No. 1 employee of the company.

This mechanism actually tampers with the process space of isolated applications, so that these processes can only see the recalculated process number, such as PID=1. But in fact, they are still the original process 1000 in the host's operating system.

This technology is the Namespace mechanism in Linux.

Eight types of namespaces are provided in the Linux 5.6 kernel:

Namespace name effect Kernel version
Mount(mnt) Isolated mount point 2.4.19
Process ID(pid) Quarantine process ID 2.6.24
Network (net) Isolate network equipment, port number, etc 2.6.19
Interprocess Communication (ipc) Isolate System V IPC and POSIX message queues 2.6.19
UTS Namespace(uts) Isolate host names and domain names 2.6.19
User Namespace (user) Isolate users and user groups 3.8
Control group (cgroup) Namespace Isolate Cgroups root 4.6
Time Namespace Isolation system time 5.6

2. Various Namespace functions?

(1),Mount Namespace

The implementation sees different mount directories in different processes. Using Mount Namespace, you can only see your own mount information in the container, and the mount operation in the container will not affect the host's Mount directory.

We use the following command to create a bash process and create a new Mount Namespace:

[root@master ~]# unshare --mount --fork /bin/bash
[root@master ~]# 

After executing the above command, we have created a new Mount Namespace on the host, and the newly created Mount Namespace is added to the current command line window. Below, I use an example to verify that creating the mount directory in an independent Mount Namespace does not affect the host's Mount directory.

First create a directory under the / tmp directory.

[root@master ~]# mkdir /tmp/tmpfs

After creating the directory, mount a tmpfs type directory with the mount command. The command is as follows:

[root@master ~]# mount -t tmpfs -o size=20m tmpfs /tmp/tmpfs/

Then use the df command to view the mounted directory information:

[root@master ~]# df -h
Filesystem               Size  Used Avail Use% Mounted on
/dev/mapper/centos-root   26G  4.6G   22G  18% /
devtmpfs                 1.9G     0  1.9G   0% /dev
tmpfs                    1.9G     0  1.9G   0% /dev/shm
tmpfs                    1.9G     0  1.9G   0% /sys/fs/cgroup
tmpfs                    1.9G   20M  1.9G   2% /run
tmpfs                    378M     0  378M   0% /run/user/0
/dev/sda1               1014M  183M  832M  18% /boot
tmpfs                     20M     0   20M   0% /tmp/tmpfs

You can see that the / tmp/tmpfs directory has been mounted correctly. In order to verify that the directory is not mounted on the host, we open a new command line window and execute the df command to view the mounting information of the host:

[root@master ~]# df -h
Filesystem               Size  Used Avail Use% Mounted on
devtmpfs                 1.9G     0  1.9G   0% /dev
tmpfs                    1.9G     0  1.9G   0% /dev/shm
tmpfs                    1.9G   20M  1.9G   2% /run
tmpfs                    1.9G     0  1.9G   0% /sys/fs/cgroup
/dev/mapper/centos-root   26G  4.6G   22G  18% /
/dev/sda1               1014M  183M  832M  18% /boot
tmpfs                    378M     0  378M   0% /run/user/0

From the above output, you can see that / tmp/tmpfs is not mounted on the host. It can be seen that the mount operation in our independent Mount Namespace will not affect the host.

To further verify our idea, we continue to check the Namespace information of the current process in the current command line window. The command is as follows:

[root@master ~]# ls -l /proc/self/ns/
total 0
lrwxrwxrwx 1 root root 0 Dec  2 18:39 ipc -> ipc:[4026531839]
lrwxrwxrwx 1 root root 0 Dec  2 18:39 mnt -> mnt:[4026532476]
lrwxrwxrwx 1 root root 0 Dec  2 18:39 net -> net:[4026531956]
lrwxrwxrwx 1 root root 0 Dec  2 18:39 pid -> pid:[4026531836]
lrwxrwxrwx 1 root root 0 Dec  2 18:39 user -> user:[4026531837]
lrwxrwxrwx 1 root root 0 Dec  2 18:39 uts -> uts:[4026531838]

Then, in the newly opened command line window, use the same command to view the Namespace information on the host:

[root@master ~]# ls -l /proc/self/ns/
total 0
lrwxrwxrwx 1 root root 0 Dec  2 18:39 ipc -> ipc:[4026531839]
lrwxrwxrwx 1 root root 0 Dec  2 18:39 mnt -> mnt:[4026531840]
lrwxrwxrwx 1 root root 0 Dec  2 18:39 net -> net:[4026531956]
lrwxrwxrwx 1 root root 0 Dec  2 18:39 pid -> pid:[4026531836]
lrwxrwxrwx 1 root root 0 Dec  2 18:39 user -> user:[4026531837]
lrwxrwxrwx 1 root root 0 Dec  2 18:39 uts -> uts:[4026531838]

By comparing the output results of the two commands, we can see that the ID values of other namespaces are the same except that the ID values of Mount Namespace are different.

From the above results, we can conclude that you can create a new Mount Namespace by using the unshare command, and the mount in the new Mount Namespace is completely isolated from the outside.

(2),PID Namespace

PID Namespace is used to isolate processes. In different PID namespaces, processes can have the same PID number. Using PID Namespace, the main process of each container can be realized as process 1, while the processes in the container have different PIDs on the host. For example, a process has a PID of 122 on the host. Using PID Namespace, the process can see a PID of 1 in the container.

We use the following command to create a bash process and create a new PID Namespace:

[root@master ~]# unshare --pid --fork --mount-proc /bin/bash
[root@master ~]# 

After executing the above command, we create a new PID Namespace on the host, and the newly created PID Namespace is added to the current command line window. In the current command line window, use the ps aux command to view the process information:

[root@master ~]# ps aux
USER        PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root          1  0.0  0.0 115684  2144 pts/0    S    18:47   0:00 /bin/bash
root         13  0.0  0.0 155452  1848 pts/0    R+   18:49   0:00 ps aux

Through the output result of the above command, we can see that bash is process 1 under the current Namespace, and we can't see other process information on the host.

(3),UTS Namespace

UTS Namespace is mainly used to isolate host names. It allows each UTS Namespace to have an independent host name.

For example, our host name is master. Using UTS Namespace, the host name in the container can be docker or any other user-defined host name.

Similarly, we verify the function of UTS Namespace through an example. First, we use the unshare command to create a UTS Namespace:

[root@master ~]# Hostname / / view the current hostname
master
[root@master ~]# unshare --uts --fork /bin/bash

After the UTS Namespace is created, the current command line window is already in an independent UTS Namespace. Let's use the hostname command (hostname can be used to view the host name) to set the following host name:

[root@master ~]# hostname -b docker
[root@master ~]# hostname
docker

Then we open a new command line window and use the same command to view the host's hostname:

[root@master ~]# hostname
master

You can see that the name of the host is still master and has not been modified. Thus, it can be verified that the UTS Namespace can be used to isolate host names.

(4),IPC Namespace

IPC Namespace is mainly used to isolate inter process communication. For example, when PID Namespace and IPC Namespace are used together, processes in the same IPC Namespace can communicate with each other, but processes in different IPC namespaces cannot communicate.

We use the unshare command to create an IPC Namespace:

[root@master ~]# unshare --ipc --fork /bin/bash
[root@docker ~]# 

Next, we need two commands to verify IPC Namespace.

  • ipcs -q: used to view the list of communication queues between systems
  • ipcmk -Q: used to create inter system communication queues

First, use the ipcs -q command to view the list of system communication queues under the current IPC Namespace:

[root@docker ~]# ipcs -q

------ Message Queues --------
key        msqid      owner      perms      used-bytes   messages    

It can be seen from the above that there is no system communication queue at present. Then we use the ipcmk -Q command to create a system communication queue:

[root@docker ~]# ipcmk -Q
Message queue id: 0

Use the ipcs -q command again to view the system communication queue list under the current IPC Namespace:

[root@docker ~]# ipcs -q

------ Message Queues --------
key        msqid      owner      perms      used-bytes   messages    
0x4a19cc47 0          root       644        0            0       

You can see that we have successfully created a system communication queue. Then, open a new command line window and use the ipcs -q command to view the system communication queue of the host:

[root@master ~]# ipcs -q

------ Message Queues --------
key        msqid      owner      perms      used-bytes   messages    

Through the above experiments, it can be found that the system communication queue created in a separate IPC Namespace cannot be seen on the host. That is, IPC Namespace realizes the isolation of system communication queue.

(5),User Namespace

User Namespace is mainly used to isolate users and user groups. A typical application scenario is that processes running as non root users on the host can be mapped to root users in a separate User Namespace. Using User Namespace, the process can have root permission in the container, but it is only an ordinary user on the host.

User Namespace can be created without root permission. Let's create a User Namespace as an ordinary user. The command is as follows:

[root@docker ~]# su - test
Last login: Thu Dec  2 19:11:29 CST 2021 on pts/0
[test@docker ~]$ unshare --user -r /bin/bash
[root@docker ~]# 

By default, the user namespace allowed to be created by CentOS7 is 0. If the above command fails (the error returned by the unshare command is unshare: unshare failed: Invalid argument), you need to use the following command to modify the user namespace allowed to be created by the system   User Namespace   quantity
The command is: echo 65535 > / proc / sys / user / max_ user_ Namespaces, and then try to create the User Namespace again.

Then execute the id command to view the current user information:

[root@docker ~]# id
uid=0(root) gid=0(root) groups=0(root)
[root@docker ~]# 

From the above output, we can see that we are already the root user in the new User Namespace. Next, we use the reboot command that can only be executed by the host root user to verify that the reboot command is executed in the current command line window:

[root@docker ~]# reboot
Failed to open /dev/initctl: Permission denied
Failed to talk to init daemon.

You can see that although we are the root user in the newly created User Namespace, we do not have permission to execute the reboot command. This shows that the root permission of the host cannot be obtained in the isolated User Namespace, that is, the User Namespace realizes the isolation of users and user groups.

(6),Net Namespace

Net Namespace is used to isolate network equipment, IP address, port and other information. Net Namespace allows each process to have its own independent IP address, port and network card information. For example, if the host IP address is 192.168.209.1, an independent IP address 172.16.4.1 can be set in the container.

We use the ip a command to view the network information on the following host:

[root@docker ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ens32: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:0c:29:30:6d:8c brd ff:ff:ff:ff:ff:ff
    inet 192.168.209.148/24 brd 192.168.209.255 scope global noprefixroute dynamic ens32
       valid_lft 1760sec preferred_lft 1760sec
    inet6 fe80::8081:c385:2b72:fe59/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default 
    link/ether 02:42:d7:5e:07:6e brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
    inet6 fe80::42:d7ff:fe5e:76e/64 scope link 
       valid_lft forever preferred_lft forever

We create a Net Namespace using the following command:

[root@docker ~]# unshare --net --fork /bin/bash
[root@docker ~]# ip a
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
[root@docker ~]# 

It can be seen that there are lo, eth0, docker0 and other network devices on the host, while the new Net Namespace is different from the network devices on the host.

3. Why does Docker need a Namespace?

When Docker creates a new container, it will create these six namespaces, and then add the processes in the container to these namespaces, so that the processes in the Docker container can only see the system resources in the current Namespace.

It is precisely because Docker uses these Namespace technologies of Linux that Docker container isolation is realized. It can be said that without Namespace, there will be no Docker container.

Posted by Trey395 on Fri, 03 Dec 2021 01:58:54 -0800