Docker Learning: Image's Local Storage Architecture

Keywords: Linux Docker JSON curl MySQL

Written in front

docker pull, docker build and docker commit are commonly used for image manipulation when using Docker. But after a bloody lesson, I've given up this command for a long time. It's probably known that Images are made up of innumerable Layer s in Docker, but Image How is it stored locally? What changes will these operations bring to local storage? With a learning attitude, I started with the installation of docker, step by step, to study the structure and meaning of the docker image directory.
I am just a beginner of docker. The purpose of writing this article is to hope that I can not only stay in the stage of using docker, but also summarize it while learning. On the one hand, I can deepen my understanding, on the other hand, I hope to communicate with the children who study Docker together in this way. If there are any mistakes, you are welcome to criticize and correct them. Thank you.

Background: Problems arising from the incompressibility of Image size

In the past, Dockerfile was used to build images on local servers. Generally speaking, the disk space is sufficient, and docker save is not needed. There is no case of frequent startup containers in application scenarios. Therefore, neither space nor efficiency has been deliberately used to compress the size of the images. Recently, however, due to the need to build on VPS, the available space is severely limited, so I feel that rewriting Dockerfile to compress the image size. I thought it should be a very simple thing, but it's really too young. Starting directly from dockerfile:

FROM alpine
........
RUN apk -U upgrade && \
    apk -v add --no-cache bash curl && \
    apk -v add --no-cache --virtual .build-deps gcc make && \
    apk -v add --no-cache mysql-client libc-dev mariadb-dev && \
    rm -rf /var/cache/apk/*
COPY ./startService.sh /
........
RUN make clean && make && make install && \
    apk del .build-deps
    
CMD ["/bin/bash", "/startService.sh"]

Experiments show that the size of apk del. build-deps is the same, that is to say, unloading environment can compress the size as expected. With the help of googole, the problem has been solved. The reasons given can be summarized as follows: "Image is made up of several Layers. Layer can't modify the previous Layer. Modifying the dockerfile can solve the problem:

FROM alpine
........
RUN apk -U upgrade && \
    apk -v add --no-cache bash curl && \
    apk -v add --no-cache mysql-client libc-dev mariadb-dev && \
    rm -rf /var/cache/apk/*
COPY ./startService.sh /
........
RUN apk -v add --no-cache --virtual .build-deps gcc make && \
    make clean && make && make install && \
    apk del .build-deps
    
CMD ["/bin/bash", "/startService.sh"]

The problem has been solved, and it may be appreciated that in dockerfile, it is better to write the intermediate processes together to reduce the number of Layer s, but why? Obviously, the misunderstanding from the beginning and the misunderstanding from now on are all due to the unclear implementation principle of Image, so we decided to analyze and understand it from the perspective of the local directory structure of Image. If you just want to see the conclusion, just jump to the end.

Environmental Science

  • Centos 7.4
  • Docker 18.09.0

Because different Docker versions have some differences in directory structure, the following operations are all for V18.09.0, and different operating systems will affect the default storage mode, etc. Here we use Centos 7.4.
Next, according to the initial Docker environment, pull out an alpine Image to analyze the local directory structure, and the meaning of each directory or file; then, based on the alpine Image, build a simple test-image image from the dockerfile, after the completion of the construction, further analyze and verify the meaning of the directory or file, and analyze the relationship between Image and Layer in this paper. How does the local file system achieve association?

Understanding the local directory structure from the simplest docker pull alpine

Generally, the default installation starts Docker, and all related files are stored under / var/lib/docker. You can use tree /var/lib/docker to view the directory structure. There are two directories related to image: image and overlay2. You need to pay attention to overlay2, which is a storage driver. Different operating systems and versions of docker may not be identical, so you should conclude when you view the directory. Consistent with your own environment:

/var/lib/docker
├── builder
│   └── fscache.db
├── buildkit
│   ├── cache.db
│   ├── content
│   │   └── ingest
│   ├── executor
│   ├── metadata.db
│   └── snapshots.db
├── containerd
│   └── daemon
│       ├── ........
│       └── tmpmounts
├── containers
├── image
│   └── overlay2
│       ├── distribution
│       ├── imagedb
│       │   ├── content
│       │   │   └── sha256
│       │   └── metadata
│       │       └── sha256
│       ├── layerdb
│       └── repositories.json
├── network
│   └── files
│       └── local-kv.db
├── overlay2
│   ├── backingFsBlockDev
│   └── l
├── plugins
│   ├── storage
│   │   └── blobs
│   │       └── tmp
│   └── tmp
├── runtimes
├── swarm
├── tmp
├── trust
└── volumes
    └── metadata.db

Because the above is just installed, and there is no pull or build image, there are only some default files or directories in the image directory, and there is no useful information in the files and directories. Now we use docker pull alpine to get the simplest image.

[root@docker-learn docker]# docker pull alpine
Using default tag: latest
latest: Pulling from library/alpine
cd784148e348: Pull complete 
Digest: sha256:46e71df1e5191ab8b8034c5189e325258ec44ea739bba1e5645cff83c9048ff1
Status: Downloaded newer image for alpine:latest

The above pull-out process only produces a Layer. We can view the pulled image through the docker images --digests command, and notice the difference between Image ID and digest.

[root@docker-learn docker]# docker images --digests
REPOSITORY          TAG                 DIGEST                                                                    IMAGE ID            CREATED             SIZE
alpine              latest              sha256:46e71df1e5191ab8b8034c5189e325258ec44ea739bba1e5645cff83c9048ff1   3f53bb00af94        8 days ago          4.41MB

At this point, we can look at the changes in the file system. For convenience, we only show the image directory:

image/
└── overlay2
    ├── distribution
    │   ├── diffid-by-digest
    │   │   └── sha256
    │   │       └── cd784148e3483c2c86c50a48e535302ab0288bebd587accf40b714fffd0646b3
    │   └── v2metadata-by-diffid
    │       └── sha256
    │           └── 7bff100f35cb359a368537bb07829b055fe8e0b1cb01085a3a628ae9c187c7b8
    ├── imagedb
    │   ├── content
    │   │   └── sha256
    │   │       └── 3f53bb00af943dfdf815650be70c0fa7b426e56a66f5e3362b47a129d57d5991
    │   └── metadata
    │       └── sha256
    ├── layerdb
    │   ├── sha256
    │   │   └── 7bff100f35cb359a368537bb07829b055fe8e0b1cb01085a3a628ae9c187c7b8
    │   │       ├── cache-id
    │   │       ├── diff
    │   │       ├── size
    │   │       └── tar-split.json.gz
    │   └── tmp
    └── repositories.json

repositories.json

This file stores a list of all local images, which currently contains two, "alpine:latest" and "alpine@sha256:46e71df1e5191ab8b8034c5189e325258ec44ea739bba1e5645cff83c9048ff1", which are actually the same image, as you can see from just docker-digests, the former is tag, and the latter is digest (docinspect 3f530094). Effectiveness).

{
    "Repositories": {
        "alpine": {
            "alpine:latest": "sha256:3f53bb00af943dfdf815650be70c0fa7b426e56a66f5e3362b47a129d57d5991",
            "alpine@sha256:46e71df1e5191ab8b8034c5189e325258ec44ea739bba1e5645cff83c9048ff1": "sha256:3f53bb00af943dfdf815650be70c0fa7b426e56a66f5e3362b47a129d57d5991"
        }
    }
}

imagedb directory

imagedb/
├── content
│   └── sha256
│       └── 3f53bb00af9......
└── metadata
    └── sha256

The directory stores information about the mirror, and the contents of each mirror are contained in its own directory. The directory name is the Image ID of the mirror.
First, the metadata directory, which holds the parent image ID of each image, because the alpine:lasted image here does not have a higher image, so the directory is empty. Then we use docker build to build an image, and further analysis.
Secondly, the content directory, under which the JSON format description information of the mirror is stored:

{
    "architecture": "amd64",
    "config": {
        "ArgsEscaped": true,
        "AttachStderr": false,
        "AttachStdin": false,
        "AttachStdout": false,
        "Cmd": [
            "/bin/sh"
        ],
        "Domainname": "",
        "Entrypoint": null,
        "Env": [
            "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
        ],
        "Hostname": "",
        "Image": "sha256:49573004c44f9413c7db63cbab336356e7a8843139fca5e68f92d84a56f0e6df",
        "Labels": null,
        "OnBuild": null,
        "OpenStdin": false,
        "StdinOnce": false,
        "Tty": false,
        "User": "",
        "Volumes": null,
        "WorkingDir": ""
    },
    "container": "c44d11fa67899a984d66f5542092b474f11ca95cc9b03b1470546f16ec8ce74f",
    "container_config": {
        "ArgsEscaped": true,
        "AttachStderr": false,
        "AttachStdin": false,
        "AttachStdout": false,
        "Cmd": [
            "/bin/sh",
            "-c",
            "#(nop) ",
            "CMD [\"/bin/sh\"]"
        ],
        "Domainname": "",
        "Entrypoint": null,
        "Env": [
            "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
        ],
        "Hostname": "c44d11fa6789",
        "Image": "sha256:49573004c44f9413c7db63cbab336356e7a8843139fca5e68f92d84a56f0e6df",
        "Labels": {},
        "OnBuild": null,
        "OpenStdin": false,
        "StdinOnce": false,
        "Tty": false,
        "User": "",
        "Volumes": null,
        "WorkingDir": ""
    },
    "created": "2018-12-21T00:21:30.122610396Z",
    "docker_version": "18.06.1-ce",
    "history": [
        {
            "created": "2018-12-21T00:21:29.97055571Z",
            "created_by": "/bin/sh -c #(nop) ADD file:2ff00caea4e83dfade726ca47e3c795a1e9acb8ac24e392785c474ecf9a621f2 in / "
        },
        {
            "created": "2018-12-21T00:21:30.122610396Z",
            "created_by": "/bin/sh -c #(nop)  CMD [\"/bin/sh\"]",
            "empty_layer": true
        }
    ],
    "os": "linux",
    "rootfs": {
        "diff_ids": [
            "sha256:7bff100f35cb359a368537bb07829b055fe8e0b1cb01085a3a628ae9c187c7b8"
        ],
        "type": "layers"
    }
}

Explain the following main parts:

  • Configuration: When starting containers based on this image in the future, the configuration in config is the default parameter when running containers.
  • Container: This is a container ID. In general, when we perform docker build to build an image, we can see that new containers are constantly generated and submitted to a new image. The container ID here is the ID of the temporary container when the image is generated, which will be further validated by docker build later.
  • container_config: The configuration of the temporary container mentioned above can compare the contents of containner_config and config. The fields are exactly the same, which verifies the function of config.
  • history: All historical commands to build the image
  • rootfs: The mirror contains the diff id of the layer layer.

layerdb directory

Like the imagedb directory, it can be understood by name that the directory is mainly used to store Docker's Layer information. In the case of only one alpine:lasted image, the directory structure is as follows:

layerdb/
├── sha256
│   └── 7bff100f35cb359a368537bb07829b055fe8e0b1cb01085a3a628ae9c187c7b8
│       ├── cache-id
│       ├── diff
│       ├── size
│       └── tar-split.json.gz
└── tmp

When we docker pull alpine:lasted, we can find that only one layer is pulled, and in the image information in imagedb/content above, there is only one diff in rootfs, so it coincides with a Layer layer here. However, it should be noted that the 7bff 100f35 here is the same as the diff_id 7bff 100f35 in rootfs, but the meaning is not the same. The reason why Layer's Chain ID is identical here is that when there is only one layer of Layer and no parent, diff id and ID chain are equal. We can see the difference later when we build the test-image.
Under the changed Layer directory, there are four files:

  • Diff: diff id of the Layer layer

    [root@docker-learn overlay2]# cat layerdb/sha256/7bff100f35cb359a368537bb07829b055fe8e0b1cb01085a3a628ae9c187c7b8/diff
    sha256:7bff100f35cb359a368537bb07829b055fe8e0b1cb01085a3a628ae9c187c7b8

    As mentioned above, the bottom Layer has the same chain id and diff id

  • Size: The size of the Layer, in bytes

    [root@docker-learn overlay2]# cat layerdb/sha256/7bff100f35cb359a368537bb07829b055fe8e0b1cb01085a3a628ae9c187c7b8/size
    4413428

    In docker images, we can see that the size of alpine Image is 4.41MB. Converting the size of alpine Image to 4413428/(1024*1024), we find that the size of alpine Image is different. The first reaction is that Image adds other information to Layer, but it seems impossible to explain in theory. So we use docker inspect alpine to look at the specific information of alpine Image and find that Size: 4413428 is the same as that of Layer. The 4.41M should be calculated at 4413428/(1000000), and we will use the test-image image to further verify it.

  • tar-split.json.gz: the split file of the tar compression package for the layer data

    This file generation needs dependencies tar-split With this file, you can restore the layer's tar package.

  • cache_id: The content is a uuid, pointing to the real storage location of Layer locality.

    [root@docker-learn layerdb]# cat sha256/7bff100f35cb359a368537bb07829b055fe8e0b1cb01085a3a628ae9c187c7b8/cache-id 
    281c53a74496be2bfcf921ceee5ec2d95139c43fec165ab667a77899b3691d56

    So where is the real local storage location for Layer? That's under the / var/lib/docker/overlay2 directory mentioned above:

    [root@docker-learn overlay2]# ls
    281c53a74496be2bfcf921ceee5ec2d95139c43fec165ab667a77899b3691d56  backingFsBlockDev  l

It should be noted that in addition to diff, size, cache_id and tar-split.json.gz files, the layerdb directory should also include a parent file, which stores the parent chain_id of the current Layer layer, because the current alpine image has only one layer, so there is no parent.

distribution directory

The directory contains the corresponding relationship between Layer layer diif id and digest.

[root@docker-learn overlay2]# tree distribution/
distribution/
├── diffid-by-digest
│   └── sha256
│       └── cd784148e3483c2c86c50a48e535302ab0288bebd587accf40b714fffd0646b3
└── v2metadata-by-diffid
    └── sha256
        └── 7bff100f35cb359a368537bb07829b055fe8e0b1cb01085a3a628ae9c187c7b8

4 directories, 2 files

In the v2metadata-by-diff ID directory, we can find the corresponding digest through Layer's diff id, and include the source repository that generates the digest.

[
    {
        "Digest": "sha256:cd784148e3483c2c86c50a48e535302ab0288bebd587accf40b714fffd0646b3",
        "HMAC": "",
        "SourceRepository": "docker.io/library/alpine"
    }
]

The diffid-by-digest directory is the opposite of V2 metadata-by-diffid

[root@docker-learn overlay2]# cat distribution/diffid-by-digest/sha256/cd784148e3483c2c86c50a48e535302ab0288bebd587accf40b714fffd0646b3 
sha256:7bff100f35cb359a368537bb07829b055fe8e0b1cb01085a3a628ae9c187c7b8

So far, based on the simplest alpine image, we have seen the local directory structure of Image and the approximate role of each directory or file. However, because the image has only one layer, many relationships are not well reflected. Next, we use a slightly more complex image to go through the above process.

Further understanding of directory structure based on docker build test-image

A simple dockerfile to build test-image

FROM alpine

LABEL name="test-image"

RUN apk -v add --no-cache bash 
RUN apk -v add --no-cache curl
COPY ./startService.sh /

CMD ["/bin/bash", "/startService.sh"]

The output of the construction process is as follows:

[root@docker-learn docker]# docker build -t test-image .
Sending build context to Docker daemon  3.072kB
Step 1/6 : FROM alpine
 ---> 3f53bb00af94
Step 2/6 : LABEL name="test-image"
 ---> Running in 3bd6320fc291
Removing intermediate container 3bd6320fc291
 ---> bb97dd1fb1a1
Step 3/6 : RUN apk -v add --no-cache bash
 ---> Running in f9987ff57ad7
fetch http://dl-cdn.alpinelinux.org/alpine/v3.8/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.8/community/x86_64/APKINDEX.tar.gz
(1/5) Installing ncurses-terminfo-base (6.1_p20180818-r1)
(2/5) Installing ncurses-terminfo (6.1_p20180818-r1)
(3/5) Installing ncurses-libs (6.1_p20180818-r1)
(4/5) Installing readline (7.0.003-r0)
(5/5) Installing bash (4.4.19-r1)
Executing bash-4.4.19-r1.post-install
Executing busybox-1.28.4-r2.trigger
OK: 18 packages, 136 dirs, 2877 files, 13 MiB
Removing intermediate container f9987ff57ad7
 ---> a5635f1b1d00
Step 4/6 : RUN apk -v add --no-cache curl
 ---> Running in c49fb2e4b311
fetch http://dl-cdn.alpinelinux.org/alpine/v3.8/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.8/community/x86_64/APKINDEX.tar.gz
(1/5) Installing ca-certificates (20171114-r3)
(2/5) Installing nghttp2-libs (1.32.0-r0)
(3/5) Installing libssh2 (1.8.0-r3)
(4/5) Installing libcurl (7.61.1-r1)
(5/5) Installing curl (7.61.1-r1)
Executing busybox-1.28.4-r2.trigger
Executing ca-certificates-20171114-r3.trigger
OK: 23 packages, 141 dirs, 3040 files, 15 MiB
Removing intermediate container c49fb2e4b311
 ---> 9156d1521a2f
Step 5/6 : COPY ./startService.sh /
 ---> 704626646baf
Step 6/6 : CMD ["/bin/bash", "/startService.sh"]
 ---> Running in 1c5e6e861264
Removing intermediate container 1c5e6e861264
 ---> 6cd0a66e83f1
Successfully built 6cd0a66e83f1
Successfully tagged test-image:latest

Mirror build process can be understood as starting a container based on a mirror, executing a command in the Dockerfile in the container, and generating a new image. Based on the above input, the construction process of test-image can be expressed as:

The final generated test-image image image ID is 6cd0a66e83f1. Starting with this image, we analyze the local directory again. First, look at the basic information of the mirror:

[root@docker-learn docker]# docker images --digests test-image
REPOSITORY          TAG                 DIGEST              IMAGE ID            CREATED             SIZE
test-image          latest              <none>              6cd0a66e83f1        About an hour ago   9.88MB

As mentioned earlier, digest is generated with docker repository because it is not pushed to a remote repository after the local build, so it is None. At this point, the image directory changes as follows:

image/
└── overlay2
    ├── distribution
    │   ├── diffid-by-digest
    │   │   └── sha256
    │   │       └── cd784148e3483c2c86c50a48e535302ab0288bebd587accf40b714fffd0646b3
    │   └── v2metadata-by-diffid
    │       └── sha256
    │           └── 7bff100f35cb359a368537bb07829b055fe8e0b1cb01085a3a628ae9c187c7b8
    ├── imagedb
    │   ├── content
    │   │   └── sha256
    │   │       ├── 3f53bb00af943dfdf815650be70c0fa7b426e56a66f5e3362b47a129d57d5991
    │   │       ├── 6cd0a66e83f133a2bad37103ed03f6480330fa3c469368eb5871320996d3b924
    │   │       ├── 704626646baf8bdea82da237819cded076a0852eb97dba2fc731569dd85ae836
    │   │       ├── 9156d1521a2fd50d972e1e1abc30d37df7c8e8f7825ca5955170f3b5441b3341
    │   │       ├── a5635f1b1d0078cd926f21ef3ed77b357aa899ac0c8bf80cae51c37129167e3a
    │   │       └── bb97dd1fb1a10b717655594950efb4605ff0d3f2f631feafc4558836c2b34c3c
    │   └── metadata
    │       └── sha256
    │           ├── 6cd0a66e83f133a2bad37103ed03f6480330fa3c469368eb5871320996d3b924
    │           │   ├── lastUpdated
    │           │   └── parent
    │           ├── 704626646baf8bdea82da237819cded076a0852eb97dba2fc731569dd85ae836
    │           │   └── parent
    │           ├── 9156d1521a2fd50d972e1e1abc30d37df7c8e8f7825ca5955170f3b5441b3341
    │           │   └── parent
    │           ├── a5635f1b1d0078cd926f21ef3ed77b357aa899ac0c8bf80cae51c37129167e3a
    │           │   └── parent
    │           └── bb97dd1fb1a10b717655594950efb4605ff0d3f2f631feafc4558836c2b34c3c
    │               └── parent
    ├── layerdb
    │   ├── mounts
    │   ├── sha256
    │   │   ├── 0e88764cdf90e8a5d6597b2d8e65b8f70e7b62982b0aee934195b54600320d47
    │   │   │   ├── cache-id
    │   │   │   ├── diff
    │   │   │   ├── parent
    │   │   │   ├── size
    │   │   │   └── tar-split.json.gz
    │   │   ├── 7bff100f35cb359a368537bb07829b055fe8e0b1cb01085a3a628ae9c187c7b8
    │   │   │   ├── cache-id
    │   │   │   ├── diff
    │   │   │   ├── size
    │   │   │   └── tar-split.json.gz
    │   │   ├── 80fe1abae43103e3be54ac2813114d1dea6fc91454a3369104b8dd6e2b1363f5
    │   │   │   ├── cache-id
    │   │   │   ├── diff
    │   │   │   ├── parent
    │   │   │   ├── size
    │   │   │   └── tar-split.json.gz
    │   │   └── db7c15c2f03f63a658285a55edc0a0012ccd0033f4695d4b428b1b464637e655
    │   │       ├── cache-id
    │   │       ├── diff
    │   │       ├── parent
    │   │       ├── size
    │   │       └── tar-split.json.gz
    │   └── tmp
    └── repositories.json

It can be seen that, compared with the alpine image only, under the content and metadata of imagedb, the image generated in the build process (mirror ID can be one-to-one correspondence) is added, and three Layer s are added under layerdb. At present, no correlation can be seen. Follow-up analysis.

repositories.json

[root@docker-learn docker]# cat image/overlay2/repositories.json | python -m json.tool
{
    "Repositories": {
        "alpine": {
            "alpine:latest": "sha256:3f53bb00af943dfdf815650be70c0fa7b426e56a66f5e3362b47a129d57d5991",
            "alpine@sha256:46e71df1e5191ab8b8034c5189e325258ec44ea739bba1e5645cff83c9048ff1": "sha256:3f53bb00af943dfdf815650be70c0fa7b426e56a66f5e3362b47a129d57d5991"
        },
        "test-image": {
            "test-image:latest": "sha256:6cd0a66e83f133a2bad37103ed03f6480330fa3c469368eb5871320996d3b924"
        }
    }
}

As you can see, the item test-image has been added, including its tag and id.

imagedb directory

[root@docker-learn overlay2]# tree imagedb/
imagedb/
├── content
│   └── sha256
│       ├── 3f53bb00af943dfdf815650be70c0fa7b426e56a66f5e3362b47a129d57d5991
│       ├── 6cd0a66e83f133a2bad37103ed03f6480330fa3c469368eb5871320996d3b924
│       ├── 704626646baf8bdea82da237819cded076a0852eb97dba2fc731569dd85ae836
│       ├── 9156d1521a2fd50d972e1e1abc30d37df7c8e8f7825ca5955170f3b5441b3341
│       ├── a5635f1b1d0078cd926f21ef3ed77b357aa899ac0c8bf80cae51c37129167e3a
│       └── bb97dd1fb1a10b717655594950efb4605ff0d3f2f631feafc4558836c2b34c3c
└── metadata
    └── sha256
        ├── 6cd0a66e83f133a2bad37103ed03f6480330fa3c469368eb5871320996d3b924
        │   ├── lastUpdated
        │   └── parent
        ├── 704626646baf8bdea82da237819cded076a0852eb97dba2fc731569dd85ae836
        │   └── parent
        ├── 9156d1521a2fd50d972e1e1abc30d37df7c8e8f7825ca5955170f3b5441b3341
        │   └── parent
        ├── a5635f1b1d0078cd926f21ef3ed77b357aa899ac0c8bf80cae51c37129167e3a
        │   └── parent
        └── bb97dd1fb1a10b717655594950efb4605ff0d3f2f631feafc4558836c2b34c3c
            └── parent

9 directories, 12 files

Before building test-image, the metadata directory was empty because alpine had no parent. After building, five new directories were added to correspond to the five images generated by docker build test-image. The parent layer above each layer is the parent layer, i.e. the parent file stores the image id of the previous layer.
In the content directory, there are five more mirrors than before building test-image. Take test-image as an example (the lowest mirror at present) to see its description information:

{
    "architecture": "amd64",
    "config": {
        "ArgsEscaped": true,
        "AttachStderr": false,
        "AttachStdin": false,
        "AttachStdout": false,
        "Cmd": [
            "/bin/bash",
            "/startService.sh"
        ],
        "Domainname": "",
        "Entrypoint": null,
        "Env": [
            "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
        ],
        "Hostname": "",
        "Image": "sha256:704626646baf8bdea82da237819cded076a0852eb97dba2fc731569dd85ae836",
        "Labels": {
            "name": "test-image"
        },
        "OnBuild": null,
        "OpenStdin": false,
        "StdinOnce": false,
        "Tty": false,
        "User": "",
        "Volumes": null,
        "WorkingDir": ""
    },
    "container": "1c5e6e861264654f79a190eba5157dd4dedce59ab3de098a3625fb4e5b6f1d98",
    "container_config": {
        "ArgsEscaped": true,
        "AttachStderr": false,
        "AttachStdin": false,
        "AttachStdout": false,
        "Cmd": [
            "/bin/sh",
            "-c",
            "#(nop) ",
            "CMD [\"/bin/bash\" \"/startService.sh\"]"
        ],
        "Domainname": "",
        "Entrypoint": null,
        "Env": [
            "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
        ],
        "Hostname": "1c5e6e861264",
        "Image": "sha256:704626646baf8bdea82da237819cded076a0852eb97dba2fc731569dd85ae836",
        "Labels": {
            "name": "test-image"
        },
        "OnBuild": null,
        "OpenStdin": false,
        "StdinOnce": false,
        "Tty": false,
        "User": "",
        "Volumes": null,
        "WorkingDir": ""
    },
    "created": "2019-01-01T02:29:19.701494089Z",
    "docker_version": "18.09.0",
    "history": [
        {
            "created": "2018-12-21T00:21:29.97055571Z",
            "created_by": "/bin/sh -c #(nop) ADD file:2ff00caea4e83dfade726ca47e3c795a1e9acb8ac24e392785c474ecf9a621f2 in / "
        },
        {
            "created": "2018-12-21T00:21:30.122610396Z",
            "created_by": "/bin/sh -c #(nop)  CMD [\"/bin/sh\"]",
            "empty_layer": true
        },
        {
            "created": "2019-01-01T02:29:06.530296297Z",
            "created_by": "/bin/sh -c #(nop)  LABEL name=test-image",
            "empty_layer": true
        },
        {
            "created": "2019-01-01T02:29:14.182236016Z",
            "created_by": "/bin/sh -c apk -v add --no-cache bash"
        },
        {
            "created": "2019-01-01T02:29:19.327280058Z",
            "created_by": "/bin/sh -c apk -v add --no-cache curl"
        },
        {
            "created": "2019-01-01T02:29:19.549474383Z",
            "created_by": "/bin/sh -c #(nop) COPY file:fff66db7f2d773b25215edcc9d5697d84813835e3b731e5a6afe9a9b9647ecec in / "
        },
        {
            "created": "2019-01-01T02:29:19.701494089Z",
            "created_by": "/bin/sh -c #(nop)  CMD [\"/bin/bash\" \"/startService.sh\"]",
            "empty_layer": true
        }
    ],
    "os": "linux",
    "rootfs": {
        "diff_ids": [
            "sha256:7bff100f35cb359a368537bb07829b055fe8e0b1cb01085a3a628ae9c187c7b8",
            "sha256:b1ddbff022577cd249a074285a1a7eb76d7c9139132ba5aa4272fc115dfa9e36",
            "sha256:9edc93f4dcf640f272ed73f933863dbefae6719745093d09c6c6908f402b1c34",
            "sha256:a6c8828ba4b58628284f783d3c918ac379ae2aba0830f4c926a330842361ffb6"
        ],
        "type": "layers"
    }
}

Pay attention to several parameters:

  • Container: The container ID here corresponds to the container ID that generates the image in Figure XX.
  • container_config: Container configuration status, which can be seen after executing the commands in dockerfile.
  • Rootfs: The diff ID of Layer included in the image shows that the test-image image image contains four Layers. When I first analyzed this, I was a little confused. In my imagination, every command in Dockerfile corresponds to a Layer, that is, a diff id, but there are six commands in the dockerfile, but there are only four layers. I further sorted out and analyzed the rootfs of each image, as shown in Figure 2 below. As you can see, no new Layer is generated in the two lines of LABEL name="test-image" and CMD ["/bin/bash", "/startService.sh"]. In fact, if we think of mirroring as a packaged static OS, then Layer can be regarded as describing the FS changes of the OS, that is, changes in files or directories in the file system. Obviously, the above two commands do not cause changes in fs, but will be written into the config of the mirror, read when the container is generated, and there will naturally be no diff ID.

So far, we have explained the directories related to Image, and summarized that the configuration information of a single Image is stored in the content directory, with image id as the file name, and the related information between Images is stored in metadata and parent file. Then, when we generate containers based on images, we generate a file system, but the above information does not contain fs data. Because the real fs data is stored in Layer. As mentioned earlier, Layer's information is stored in the layerdb directory, so we switch to the layerdb directory.

layerdb directory

[root@docker-learn overlay2]# tree layerdb/
layerdb/
├── mounts
├── sha256
│   ├── 0e88764cdf90e8a5d6597b2d8e65b8f70e7b62982b0aee934195b54600320d47
│   │   ├── cache-id
│   │   ├── diff
│   │   ├── parent
│   │   ├── size
│   │   └── tar-split.json.gz
│   ├── 7bff100f35cb359a368537bb07829b055fe8e0b1cb01085a3a628ae9c187c7b8
│   │   ├── cache-id
│   │   ├── diff
│   │   ├── size
│   │   └── tar-split.json.gz
│   ├── 80fe1abae43103e3be54ac2813114d1dea6fc91454a3369104b8dd6e2b1363f5
│   │   ├── cache-id
│   │   ├── diff
│   │   ├── parent
│   │   ├── size
│   │   └── tar-split.json.gz
│   └── db7c15c2f03f63a658285a55edc0a0012ccd0033f4695d4b428b1b464637e655
│       ├── cache-id
│       ├── diff
│       ├── parent
│       ├── size
│       └── tar-split.json.gz
└── tmp

Compared with alpine mirroring only, first of all, the layerdb directory has one mounts directory. Simply speaking, when the container is generated from the mirror, two layers of readable and writable containers will be generated in this directory. Readability is generated from the mirror, and writability is the future modification of the container will be put in place. Because this article only discusses the mirror, the directory will no longer be analyzed in depth.

Secondly, Layer has now increased to four, which corresponds to four layers diif IDS in the rootfs configuration of test-image as we saw in the previous section. Then, it is clear that the other three are totally different except for the "7bff100f35cb" in the first layer. Further study shows that the directory name here is actually a chain id of layers, not a diff id. As for the difference between the two, we can understand that the diff id is used to describe a single change, while the chain id is used to facilitate the change of some columns. The formula between diff id and chain id can be used. image-spec Seen in China.

ChainID(A) = DiffID(A)
ChainID(A|B) = Digest(ChainID(A) + " " + DiffID(B))
ChainID(A|B|C) = Digest(ChainID(A|B) + " " + DiffID(C))

Here, the rootfs of test-image are used to validate and analyze how they are related.

"rootfs": {
    "diff_ids": [
        "sha256:7bff100f35cb359a368537bb07829b055fe8e0b1cb01085a3a628ae9c187c7b8",
        "sha256:b1ddbff022577cd249a074285a1a7eb76d7c9139132ba5aa4272fc115dfa9e36",
        "sha256:9edc93f4dcf640f272ed73f933863dbefae6719745093d09c6c6908f402b1c34",
        "sha256:a6c8828ba4b58628284f783d3c918ac379ae2aba0830f4c926a330842361ffb6"
    ],
    "type": "layers"
}
ChainID(A) = DiffID(A) = sha256:7bff100f35cb359a368537bb07829b055fe8e0b1cb01085a3a628ae9c187c7b8

ChainID(A|B) = Digest(ChainID(A) + " " + DiffID(B))
ChainID(A) = sha256:7bff100f35cb359a368537bb07829b055fe8e0b1cb01085a3a628ae9c187c7b8
DiffID(B) = sha256:b1ddbff022577cd249a074285a1a7eb76d7c9139132ba5aa4272fc115dfa9e36
//Calculation:
[root@docker-learn overlay2]# echo -n "sha256:7bff100f35cb359a368537bb07829b055fe8e0b1cb01085a3a628ae9c187c7b8 sha256:b1ddbff022577cd249a074285a1a7eb76d7c9139132ba5aa4272fc115dfa9e36" | sha256sum -
db7c15c2f03f63a658285a55edc0a0012ccd0033f4695d4b428b1b464637e655  -

//Result:
ChainID(A|B) = sha256:db7c15c2f03f63a658285a55edc0a0012ccd0033f4695d4b428b1b464637e655
chainID(A|B|C) = sha256:0e88764cdf90e8a5d6597b2d8e65b8f70e7b62982b0aee934195b54600320d47
chainID(A|B|C|D) = sha256:80fe1abae43103e3be54ac2813114d1dea6fc91454a3369104b8dd6e2b1363f5

][3]

Therefore, the corresponding directory of Layer in the second layer of test-image is: layerdb/sha256/db7c15c2f03f63a658285a55edc0a0012ccd0033f4695d4b428b1b4637e655. Check the Layer's information:

[root@docker-learn sha256]# ls db7c15c2f03f63a658285a55edc0a0012ccd0033f4695d4b428b1b464637e655/
cache-id  diff  parent  size  tar-split.json.gz

There are more parent s than in the previous section, including the chain id of the previous Layer.

destribution directory

[root@docker-learn overlay2]# tree distribution/
distribution/
├── diffid-by-digest
│   └── sha256
│       └── cd784148e3483c2c86c50a48e535302ab0288bebd587accf40b714fffd0646b3
└── v2metadata-by-diffid
    └── sha256
        └── 7bff100f35cb359a368537bb07829b055fe8e0b1cb01085a3a628ae9c187c7b8

4 directories, 2 files

At present, there is no difference between the destribution directory and the alpine image only, because digest is generated by the mirror warehouse, and naturally there is no digest before the locally constructed image is pushed to the warehouse. Push test-image to docker hub using docker push command.

[root@docker-learn distribution]# docker push backbp/test-image:lasted
The push refers to repository [docker.io/backbp/test-image]
a6c8828ba4b5: Pushed 
9edc93f4dcf6: Pushed 
b1ddbff02257: Pushed 
7bff100f35cb: Mounted from library/alpine 
lasted: digest: sha256:3dc66a43c28ea3e994e4abf6a2d04c7027a9330e8eeab5c609e4971a8c58f0b0 size: 1156

According to the process output, we can see that although the test-image image Image includes four layers of Layer, because the bottom 7bff100f35cb was originally pulled when docker pull alpine, it naturally does not need to push any more, so there are only three layers of true push. Now the destribution directory has added digest corresponding to Layer, which can be seen in the process output above.

distribution/
├── diffid-by-digest
│   └── sha256
│       ├── 2826782ee82560ec5f90a8a9da80880d48dd4036763f5250024fab5b3ef8e8cf
│       ├── 8e905c02e6908fbb0e591cea285470208920d32408735bd6a8fcaf85ffba9089
│       ├── a5bec9983f6902f4901b38735db9c427190ffcb3734c84ee233ea391da81081b
│       └── cd784148e3483c2c86c50a48e535302ab0288bebd587accf40b714fffd0646b3
└── v2metadata-by-diffid
    └── sha256
        ├── 7bff100f35cb359a368537bb07829b055fe8e0b1cb01085a3a628ae9c187c7b8
        ├── 9edc93f4dcf640f272ed73f933863dbefae6719745093d09c6c6908f402b1c34
        ├── a6c8828ba4b58628284f783d3c918ac379ae2aba0830f4c926a330842361ffb6
        └── b1ddbff022577cd249a074285a1a7eb76d7c9139132ba5aa4272fc115dfa9e36

Expanding thinking

What happened to the docker tag command?

We generate a new tag, test-image-tag, based on test-image

[root@docker-learn overlay2]# docker tag test-image test-image-tag
[root@docker-learn overlay2]# docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
test-image-tag      latest              6cd0a66e83f1        4 hours ago         9.88MB
test-image          latest              6cd0a66e83f1        4 hours ago         9.88MB
alpine              latest              3f53bb00af94        11 days ago         4.41MB

Looking at the repositories.json file shows that both tag s mirror to the same image id, so this command is equivalent to modifying only repositories.json.

[root@docker-learn overlay2]# cat repositories.json | python -m json.tool
{
    "Repositories": {
        "alpine": {
            "alpine:latest": "sha256:3f53bb00af943dfdf815650be70c0fa7b426e56a66f5e3362b47a129d57d5991",
            "alpine@sha256:46e71df1e5191ab8b8034c5189e325258ec44ea739bba1e5645cff83c9048ff1": "sha256:3f53bb00af943dfdf815650be70c0fa7b426e56a66f5e3362b47a129d57d5991"
        },
        "test-image": {
            "test-image:latest": "sha256:6cd0a66e83f133a2bad37103ed03f6480330fa3c469368eb5871320996d3b924"
        },
        "test-image-tag": {
            "test-image-tag:latest": "sha256:6cd0a66e83f133a2bad37103ed03f6480330fa3c469368eb5871320996d3b924"
        }
    }
}

How does Image Size calculate?

The alpine and test-image images are 4.41M and 9.88M, respectively. For more accurate analysis, docker inspect images can be used to view detailed sizes, 4413428 and 9876099, respectively.
Look again at the Layer sizes (layerdb/layer chain id/size) for each layer, respectively

  • Layer1: 4413428
  • Layer2: 3815516
  • Layer3: 1647117
  • Layer4: 38

alpine has only one layer, so image size is equal to Layer size; test-image has four layers, so image size = sum (Layer 1 + Layer 2 + Layer 3 + Layer 4) = 9876099

Back to the original question

After completing the above analysis, let's go back to the original problem. In the layer of apk del. build-deps, the Layer generated after execution only deletes the installation package relative to the previous layer. When calculating Image size (or merging Image according to Layer), the transformation caused by installing. build-deps still exists in the Layer of app add, so the size will not decrease. Small. If add and del are placed in the same command, then the Layer record generated in this layer is relative to the changes in the previous layer. The installation and uninstallation of. build-deps does not exist at all compared to the previous layer, so Layer does not exist at all.
Ultimately, it is OCI Image's principle that causes the most important thing to remember is that each Layer records changes compared with the previous Layer.

Reference resources

image-spec

Posted by amalosoul on Wed, 08 May 2019 13:00:40 -0700