Best practices for building container images based on Dockerfile

1. Background overview

Container mirroring is the first step of container based transformation. Summarize the reasons for image optimization

With the large-scale migration of application container deployment and the acceleration of version iteration, the main purposes of optimizing docker image of infrastructure are as follows

  • Reduce image download time during deployment
  • Improve security and reduce targets available for attack
  • Reduce recovery time
  • Save storage overhead

2. Why is the image so large

Here, we briefly analyze several typical repos and summarize several reasons why the existing Docker image is large

2.1 the basic image is too large

For example, the size of the image produced in warehouse A is 9.67GB

Basic image used: the image size is 8.72GB

Through reverse analysis, why is the basic image so large? The result is needless to say 0.0

2.2 the basic image is too large and cannot be found

For example, warehouse B produces an image with a size of 22.7GB

Basic image used: 404 not found, yes, 0.0 cannot be found

2.3. Git directory (unnecessary directory)

For more information on this question, please refer to my previous article Why is the Git directory so large

Example: warehouse C, code size 795MB

The size of. git directory is 225MB, and the instructions in dockerfile are as follows (all added to the image)

ADD . /app/startapp/

It also contains the d directory, which is about 300MB in size. It is unknown whether it needs to be used, but it does not need to be used by visual inspection. It is only test data

├── [ 503]  test_421.json
├── [ 483]  test_havalB9.json
├── [ 484]  test_144.json
├── [ 104]  .gitmodules
├── [ 122]  .idea
├── [   0]
├── [ 11M]
├── [108M]  test_180753.csv
├── [ 68M]  test_180753.txt
└── [ 335]

In fact, none of the above needs to be submitted to the image to make an image

2.4 Dockerfile has other problems

It goes without saying that dockerfiles written by non professionals may have some optimization space, but they just don't pay attention to these details for the time being

For example, various repo developers are allowed to write dockerfiles by themselves. Without certain standards, it may not matter in the early stage. In the later stage, problems slowly emerge

It's the so-called "just use it"~

3. How to optimize Dockerfile

3.1 where to start

Optimizing docker image should start with the concept of image layering

3.1.1 take a chestnut

A practical example

nginx:alpine image 23.2MB

# docker history nginx:alpine
IMAGE          CREATED       CREATED BY                                      SIZE      COMMENT
b46db85084b8   9 days ago    /bin/sh -c #(nop)  CMD ["nginx" "-g" "daemon...   0B        
<missing>      9 days ago    /bin/sh -c #(nop)  STOPSIGNAL SIGQUIT           0B        
<missing>      9 days ago    /bin/sh -c #(nop)  EXPOSE 80                    0B        
<missing>      9 days ago    /bin/sh -c #(nop)  ENTRYPOINT ["/docker-entr...   0B        
<missing>      9 days ago    /bin/sh -c #(nop) COPY file:09a214a3e07c919a...   4.61kB    
<missing>      9 days ago    /bin/sh -c #(nop) COPY file:0fd5fca330dcd6a7...   1.04kB    
<missing>      9 days ago    /bin/sh -c #(nop) COPY file:0b866ff3fc1ef5b0...   1.96kB    
<missing>      9 days ago    /bin/sh -c #(nop) COPY file:65504f71f5855ca0...   1.2kB     
<missing>      9 days ago    /bin/sh -c set -x     && addgroup -g 101 -S ...   17.6MB    
<missing>      9 days ago    /bin/sh -c #(nop)  ENV PKG_RELEASE=1            0B        
<missing>      9 days ago    /bin/sh -c #(nop)  ENV NJS_VERSION=0.7.0        0B        
<missing>      9 days ago    /bin/sh -c #(nop)  ENV NGINX_VERSION=1.21.4     0B        
<missing>      9 days ago    /bin/sh -c #(nop)  LABEL maintainer=NGINX Do...   0B        
<missing>      10 days ago   /bin/sh -c #(nop)  CMD ["/bin/sh"]              0B        
<missing>      10 days ago   /bin/sh -c #(nop) ADD file:762c899ec0505d1a3...   5.61MB

python:alpine image 45.5MB

# docker history python:alpine
IMAGE          CREATED       CREATED BY                                      SIZE      COMMENT
382a63bb2f25   10 days ago   /bin/sh -c #(nop)  CMD ["python3"]              0B        
<missing>      10 days ago   /bin/sh -c set -ex;   wget -O "$P...   8.31MB    
<missing>      10 days ago   /bin/sh -c #(nop)  ENV PYTHON_GET_PIP_SHA256...   0B        
<missing>      10 days ago   /bin/sh -c #(nop)  ENV PYTHON_GET_PIP_URL=ht...   0B        
<missing>      10 days ago   /bin/sh -c #(nop)  ENV PYTHON_SETUPTOOLS_VER...   0B        
<missing>      10 days ago   /bin/sh -c #(nop)  ENV PYTHON_PIP_VERSION=21...   0B        
<missing>      10 days ago   /bin/sh -c cd /usr/local/bin  && ln -s idle3...   32B       
<missing>      10 days ago   /bin/sh -c set -ex  && apk add --no-cache --...   29.8MB    
<missing>      10 days ago   /bin/sh -c #(nop)  ENV PYTHON_VERSION=3.10.0    0B        
<missing>      10 days ago   /bin/sh -c #(nop)  ENV GPG_KEY=A035C8C19219B...   0B        
<missing>      10 days ago   /bin/sh -c set -eux;  apk add --no-cache   c...   1.82MB    
<missing>      10 days ago   /bin/sh -c #(nop)  ENV LANG=C.UTF-8             0B        
<missing>      10 days ago   /bin/sh -c #(nop)  ENV PATH=/usr/local/bin:/...   0B        
<missing>      10 days ago   /bin/sh -c #(nop)  CMD ["/bin/sh"]              0B        
<missing>      10 days ago   /bin/sh -c #(nop) ADD file:762c899ec0505d1a3...   5.61MB

Actual storage

# docker inspect nginx:alpine| jq '.[0]|{GraphDriver}'             
  "GraphDriver": {
    "Data": {
      "LowerDir": "/data/docker-overlay2/overlay2/3d.../diff:/data/docker-overlay2/overlay2/ae.../diff:/data/docker-overlay2/overlay2/ea.../diff:/data/docker-overlay2/overlay2/29.../diff:/data/docker-overlay2/overlay2/5e.../diff",
      "MergedDir": "/data/docker-overlay2/overlay2/b7.../merged",
      "UpperDir": "/data/docker-overlay2/overlay2/b7.../diff",
      "WorkDir": "/data/docker-overlay2/overlay2/b7.../work"
    "Name": "overlay2"

Description of hierarchical concept

Image solves the problem of application running and environment packaging. In practical applications, applications are packaged and iterated based on the same rootfs, but not every rootfs has multiple copies. In fact, docker uses the storage technology of storage driver AUFS, devicemapper, overlay and overlay 2 to achieve layering

For example, if you view a docker image above, you will find these layers

  • LowerDir: mirror layer
  • Merged dir: it integrates the views displayed by the lower layer and the upper read-write layer
  • UpperDir: read / write layer
  • WorkDir: the middle layer. When writing to the Upper layer, first write to WorkDir, and then move to UpperDir

3.1.2 Copy on write

When Docker starts a container for the first time, the initial read-write layer is empty. When the file system changes, these changes will be applied to this layer. For example, if you want to modify a file, the file will first be copied from the read-only layer below the read-write layer to the read-write layer. Therefore, the read-only version of the file still exists in the read-only layer, but is hidden by the copy of the file in the read-write layer. This mechanism is called copy on write

3.1.3 UnionFS

The contents of multiple directories (also known as branches) are jointly mounted in the same directory, and the physical locations of the directories are separate

For an intuitive effect, pull an nginx:1.15 image for the first time, and then pull the nginx:1.16 image again, which is much faster

3.2 scheme

After understanding the main components of the image size, it is easy to know which direction to reduce the image size

3.2.1 reduce the number of image layers

For Dockerfile, the increase in the number of image layers mainly depends on the number of RUN instructions. Therefore, merging RUN instructions can greatly reduce the number of image layers

For example, chestnuts:

Before merger, three layers

RUN apk add tzdata
RUN cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
RUN echo "Asia/Shanghai" > /etc/timezone

After merging, one layer

RUN apk add tzdata \
    && cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime \
    && echo "Asia/Shanghai" > /etc/timezone

3.2.2 reduce the image size of each layer select a smaller basic image
  • Scratch: empty image, also known as the father of image! Any image needs a basic image, so the problem comes, like the question of chicken or egg first. What is the "ancestor" of the basic image? Can you build without any image? The answer is yes. You can choose scratch. For details, please refer to: baseimages , use the example of scratch image pause
  • busybox: compared with scratch, there are many commonly used linux tools
  • alpine: more package management tools apk, etc multi stage construction

Multistage construction is very suitable for compiled languages. In short, it allows multiple FROM instructions to appear in a Dockerfile. Only the basic image specified in the last FROM instruction is used as the basic image of this construction image, and other stages can be considered as intermediate steps only

FROM... AS... And COPY --from

For example, java image, the image size is 812MB

FROM centos AS jdk
COPY jdk-8u231-linux-x64.tar.gz /usr/local/src
RUN cd /usr/local/src && \
    tar -xzvf jdk-8u231-linux-x64.tar.gz -C /usr/local

Using multi-stage construction, the image size is 618MB

FROM centos AS jdk
COPY jdk-8u231-linux-x64.tar.gz /usr/local/src
RUN cd /usr/local/src && \
    tar -xzvf jdk-8u231-linux-x64.tar.gz -C /usr/local

FROM centos
COPY --from=jdk /usr/local/jdk1.8.0_231 /usr/local ignore files

Build context "build context" means the surrounding environment related to the current work

The current working directory when docker build s. No matter whether some files and directories in the current directory are used during construction, by default, the files and directories in this context will be sent to Docker Daemon as the content of the construction context

When docker build starts executing, the console will output Sending build context to Docker daemon xxxMB, which means that the files and directories in the current working directory are used as the build context

As mentioned earlier, you can add -- no cache in the RUN instruction without using cache. Similarly, you can add this instruction when executing the docker build command to not use cache during image construction

In the context of construction, using the. Docker ignore file can avoid copying local modules and debug logs into the docker image during construction, which is very similar to. gitignore under git version control remote download

Using remote download instead of ADD can reduce the image size

RUN curl -s | tar -xC /opt/ split COPY

For example, in the directory A of A COPY instruction, four subdirectories AA/BB/CC/DD are copied, but only one BB is often changed

Splitting COPY will be faster at this time

COPY A/DD /app/A/DD mount during construction

Mount on build( Extended function)

to configure

  • Modify the docker startup parameters and add -- experimental
  • Add # syntax=docker/dockerfile:1.1.1-experimental to the dockerfile header


  • Mount local golang cache
# syntax = docker/dockerfile:experimental
FROM golang
RUN --mount=type=cache,target=/root/.cache/go-build go build ...
  • Mount cache directory
# syntax = docker/dockerfile:experimental
FROM ubuntu
RUN rm -f /etc/apt/apt.conf.d/docker-clean; echo 'Binary::apt::APT::Keep-Downloaded-Packages "true";' > /etc/apt/apt.conf.d/keep-cache
RUN --mount=type=cache,target=/var/cache/apt --mount=type=cache,target=/var/lib/apt \
  apt update && apt install -y gcc
  • Mount some credentials
# syntax = docker/dockerfile:experimental
FROM python:3
RUN pip install awscli
RUN --mount=type=secret,id=aws,target=/root/.aws/credentials aws s3 cp s3://... ...

wait post build cleanup
  • Delete compressed package
  • Clean up the installation cache
    • --no-cache
    • rm -rf /var/lib/apt/lists/*
    • rm -rf /var/cache/yum/* image compression

export and import are combined to compress the image (the compression effect is not obvious)

The disadvantage of this method is that some image information will be lost

# docker run -d --name nginx nginx:alpine
# docker export nginx |docker import - nginx:alpine2
# docker history nginx:alpine
IMAGE          CREATED       CREATED BY                                      SIZE      COMMENT
b46db85084b8   10 days ago   /bin/sh -c #(nop)  CMD ["nginx" "-g" "daemon...   0B        
<missing>      10 days ago   /bin/sh -c #(nop)  STOPSIGNAL SIGQUIT           0B        
<missing>      10 days ago   /bin/sh -c #(nop)  EXPOSE 80                    0B        
<missing>      10 days ago   /bin/sh -c #(nop)  ENTRYPOINT ["/docker-entr...   0B        
<missing>      10 days ago   /bin/sh -c #(nop) COPY file:09a214a3e07c919a...   4.61kB    
<missing>      10 days ago   /bin/sh -c #(nop) COPY file:0fd5fca330dcd6a7...   1.04kB    
<missing>      10 days ago   /bin/sh -c #(nop) COPY file:0b866ff3fc1ef5b0...   1.96kB    
<missing>      10 days ago   /bin/sh -c #(nop) COPY file:65504f71f5855ca0...   1.2kB     
<missing>      10 days ago   /bin/sh -c set -x     && addgroup -g 101 -S ...   17.6MB    
<missing>      10 days ago   /bin/sh -c #(nop)  ENV PKG_RELEASE=1            0B        
<missing>      10 days ago   /bin/sh -c #(nop)  ENV NJS_VERSION=0.7.0        0B        
<missing>      10 days ago   /bin/sh -c #(nop)  ENV NGINX_VERSION=1.21.4     0B        
<missing>      10 days ago   /bin/sh -c #(nop)  LABEL maintainer=NGINX Do...   0B        
<missing>      10 days ago   /bin/sh -c #(nop)  CMD ["/bin/sh"]              0B        
<missing>      10 days ago   /bin/sh -c #(nop) ADD file:762c899ec0505d1a3...   5.61MB    
# docker history nginx:alpine2
dd6a3cf822ac   40 seconds ago                23MB      Imported from -
# docker images|grep nginx
nginx                                                                                                               alpine2                     dd6a3cf822ac   54 seconds ago   23MB
nginx                                                                                                               alpine                      b46db85084b8   10 days ago      23.2MB

3.3 example

3.3.1 go example

Example 1

For the k8s cluster installed by kubedm, the Dockerfile of Kube apiserver image is compiled by bazel compilation tool

bazel build ...
LABEL maintainers=Kubernetes Authors
LABEL description=go based runner for distroless scenarios
COPY /workspace/go-runner . # buildkit
ENTRYPOINT ["/go-runner"]
COPY file:2e904ea733ba0ded2a99947847de31414a19d83f8495dd8c1fbed3c70bf67a22 in /usr/local/bin/kube-apiserver

Code directory 28M (including. git directory 20.5M)

Image size 122MB

Example 2

Dockerfile of open source choreography engine Cadence


# Can be used in case a proxy is necessary

# Build tcheck binary
FROM golang:1.17-alpine3.13 AS tcheck

WORKDIR /go/src/

COPY go.* ./
RUN go build -mod=readonly -o /go/bin/tcheck

# Build Cadence binaries
FROM golang:1.17-alpine3.13 AS builder


RUN apk add --update --no-cache ca-certificates make git curl mercurial unzip

WORKDIR /cadence

# Making sure that dependency is not touched
ENV GOFLAGS="-mod=readonly"

# Copy go mod dependencies and build cache
COPY go.* ./
RUN go mod download

COPY . .
RUN rm -fr .bin .build


# bypass codegen, use committed files.  must be run separately, before building things.
RUN make .fake-codegen
RUN CGO_ENABLED=0 make copyright cadence-cassandra-tool cadence-sql-tool cadence cadence-server cadence-bench cadence-canary

# Download dockerize
FROM alpine:3.11 AS dockerize

RUN apk add --no-cache openssl

RUN wget$DOCKERIZE_VERSION/dockerize-alpine-linux-amd64-$DOCKERIZE_VERSION.tar.gz \
    && tar -C /usr/local/bin -xzvf dockerize-alpine-linux-amd64-$DOCKERIZE_VERSION.tar.gz \
    && rm dockerize-alpine-linux-amd64-$DOCKERIZE_VERSION.tar.gz \
    && echo "**** fix for host id mapping error ****" \
    && chown root:root /usr/local/bin/dockerize

# Alpine base image
FROM alpine:3.11 AS alpine

RUN apk add --update --no-cache ca-certificates tzdata bash curl

# set up nsswitch.conf for Go's "netgo" implementation
RUN test ! -e /etc/nsswitch.conf && echo 'hosts: files dns' > /etc/nsswitch.conf

SHELL ["/bin/bash", "-c"]

# Cadence server
FROM alpine AS cadence-server

ENV CADENCE_HOME /etc/cadence
RUN mkdir -p /etc/cadence

COPY --from=tcheck /go/bin/tcheck /usr/local/bin
COPY --from=dockerize /usr/local/bin/dockerize /usr/local/bin
COPY --from=builder /cadence/cadence-cassandra-tool /usr/local/bin
COPY --from=builder /cadence/cadence-sql-tool /usr/local/bin
COPY --from=builder /cadence/cadence /usr/local/bin
COPY --from=builder /cadence/cadence-server /usr/local/bin
COPY --from=builder /cadence/schema /etc/cadence/schema

COPY docker/ /
COPY config/dynamicconfig /etc/cadence/config/dynamicconfig
COPY config/credentials /etc/cadence/config/credentials
COPY docker/config_template.yaml /etc/cadence/config
COPY docker/ /

WORKDIR /etc/cadence

ENV SERVICES="history,matching,frontend,worker"

EXPOSE 7933 7934 7935 7939

# All-in-one Cadence server
FROM cadence-server AS cadence-auto-setup

RUN apk add --update --no-cache ca-certificates py-pip mysql-client
RUN pip install cqlsh

COPY docker/ /


# Cadence CLI
FROM alpine AS cadence-cli

COPY --from=tcheck /go/bin/tcheck /usr/local/bin
COPY --from=builder /cadence/cadence /usr/local/bin

ENTRYPOINT ["cadence"]

# Cadence Canary
FROM alpine AS cadence-canary

COPY --from=builder /cadence/cadence-canary /usr/local/bin
COPY --from=builder /cadence/cadence /usr/local/bin

CMD ["/usr/local/bin/cadence-canary", "--root", "/etc/cadence-canary", "start"]

# Cadence Bench
FROM alpine AS cadence-bench

COPY --from=builder /cadence/cadence-bench /usr/local/bin
COPY --from=builder /cadence/cadence /usr/local/bin

CMD ["/usr/local/bin/cadence-bench", "--root", "/etc/cadence-bench", "start"]

# Final image
FROM cadence-${TARGET}

Code directory 85.4M (including. git directory 57.7M)

Image size 135.69MB

3.3.2 py example

FROM python:3.4

RUN apt-get update \
    && apt-get install -y --no-install-recommends \
        postgresql-client \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /usr/src/app
COPY requirements.txt ./
RUN pip install -r requirements.txt
COPY . .

CMD ["python", "", "runserver", ""]

Code directory 275M (including. git directory 222M)

Image size 436MB

4. What else can you do besides these optimizations

4.1 setting character set

Set a common character set in Dockerfile

# Set lang

4.2 time zone correction

For more information on this question, please refer to my previous article Multiple postures for handling container time in k8s environment

Set the common time zone in Dockerfile

# Set timezone
RUN ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime \
		 && echo "Asia/Shanghai" > /etc/timezone

4.3 process management

When the docker container is running, the ENTRYPOINT or CMD in the Dockerfile will be used as the main process with PID 1 by default. Generally speaking, the purpose of this process is to "tamp" the container. Once this process does not exist, the container will exit

In addition, the main process also plays an important role in managing the "zombie process"

As an official definition, "zombie process" refers to a process that completes execution (caused by exit system call, fatal error or termination signal at runtime), but still has its process control block in the process table of the operating system and is in the "termination state".

The main ideas for cleaning up the "zombie process" are

  • Set the processing function of SIGCHLD signal in the parent process to SIG_IGN (ignore signal);
  • fork twice and kill the first level subprocess, making the second level subprocess an orphan process and being "adopted" and cleaned up by init

Open source solutions that can be implemented at present

  • Tini tini container init is a minimal init system, which runs inside the container and is used to start a child process, clean up zombies and perform signal forwarding when waiting for the process to exit advantage
    • tini can prevent applications from generating zombie processes
    • TiNi can handle the signals of programs running in the Docker process. Through TiNi, SIGTERM can terminate the process without explicitly installing a signal processor


# Add Tini
ADD${TINI_VERSION}/tini /tini
RUN chmod +x /tini
ENTRYPOINT ["/tini", "--"]

# Run your program under Tini
CMD ["/your/program", "-and", "-its", "arguments"]
# or docker run your-image /your/program ...
  • dumb-init Dumb init sends a signal to the process group of the child process. For example, bash will not send a signal to the child process after receiving the signal Dumb init can also control sending signals only to its direct child processes by setting the environment variable DUMB_INIT_SETSID=0 In addition, dumb init will also take over the process that has lost its parent process to ensure that it can exit normally Example
FROM alpine:3.11.5
RUN sed -i "s/" /etc/apk/repositories \
    && apk add --no-cache dumb-init

# Runs "/usr/bin/dumb-init -- /my/script --with --args"
ENTRYPOINT ["dumb-init", "--"]

# or if you use --rewrite or other cli flags
# ENTRYPOINT ["dumb-init", "--rewrite", "2:3", "--"]

CMD ["/my/script", "--with", "--args"]

4.4 power reduction start

In many cases, the processes in the container need to be started with reduced rights to ensure security, which is the same as running an nginx service on the vm. It is best to run it through a specific user with reduced rights

For example, tomcat image

USER tomcat
WORKDIR /usr/local/tomcat
ENTRYPOINT ["","run"]

If sudo permission is required in some cases, avoid installing or using sudo in docker because it has unpredictable TTY and signal forwarding behavior that may cause problems. If it is necessary, for example, initialize the daemon to root but use it as non running root, gosu is recommended

For example, Official image of Postgres Use the following script as its ENTRYPOINT

set -e

if [ "$1" = 'postgres' ]; then
    chown -R postgres "$PGDATA"

    if [ -z "$(ls -A "$PGDATA")" ]; then
        gosu postgres initdb

    exec gosu postgres "$@"

exec "$@"

4.5 underlying library dependency

Many times, services rely on the support of some underlying libraries. Here, we take the construction of java image based on alpine basic image as an example

alpine does not install too many common software in order to simplify itself, so glibc is required if jdk/jre is to be used, and glibc can only be installed after obtaining CA certificates certificate service (installing glibc pre dependencies)

After running the image of jdk8 with alpine, it is found that JDK cannot be executed. The reason is that java is based on GUN Standard C library(glibc), and alpine is based on MUSL libc(mini libc), so Alpine needs to install glibc library

5. Summary

This paper briefly analyzes several main reasons why Dockerfile is so large, and lists some measures to optimize the image size and common processing methods in other aspects according to production experience. Many technical contents are messy, so I won't mention them one by one~

See you ~

reference resources

Posted by Rayne on Wed, 24 Nov 2021 20:28:06 -0800