Best practices for building container mirrors based on Dockerfile

1. Background overview

Container mirroring is the first step in the transformation of container landing. Summarize several reasons for image optimization

With the large-scale migration of application containerized deployments and accelerated version iteration, docker mirroring for optimizing infrastructure has the following main purposes

  • Reduce mirror download time on deployment

  • Enhance security and reduce targets available for attack

  • Reduce recovery time

  • Save storage overhead

2. Why is the mirror so large?

Here, several typical Repo s are briefly analyzed, and several reasons why existing Docker images are larger are summarized.

2.1 Base image is too large

Example: Warehouse A, the resulting mirror size is 9.67 GB

Basic mirror used: 8.72GB image size

On the contrary, why is the base image so large? No more 0.0 results

2.2 Base image is too large and missing

Example: Warehouse B, the resulting mirror size is 22.7GB

Basic mirror used: 404 not found, yes, 0.0 not found

2.3.git directory (unnecessary directory)

More on this issue can be found in my previous articles Why is the Git directory so large

Example: Warehouse C, code size 795MB

Where the.git directory is 225MB in size, the instructions in the dockerfile are as follows (all added to the mirror)

ADD . /app/startapp/

It also contains the d directory, which is about 300MB in size, and whether or not it needs to be used is unknown, but it is not needed visually, just for testing data.

├── [ 503]  test_421.json
├── [ 483]  test_havalB9.json
├── [ 484]  test_144.json
├── [ 104]  .gitmodules
├── [ 122]  .idea
├── [   0]
├── [ 11M]
├── [108M]  test_180753.csv
├── [ 68M]  test_180753.txt
└── [ 335]

None of the above actually needs to be submitted to the mirror for mirroring

2.4 Dockerfile itself has other problems

It goes without saying that Dockerfile s written by non-professionals may have some room for optimization, but these details are just not being looked at for the moment

For example, let each repo research and development write its own Dockerfile, without a certain standard, it may be indifferent in the early stage, but problems will gradually emerge in the later stage

Just so-called "Can Do It"~

3. How to optimize Dockerfile

3.1 Where to Start

Optimizing docker image should start with the concept of mirror hierarchy

3.1.1 Raise a chestnut

A practical example

nginx:alpine mirror 23.2MB

# docker history nginx:alpine
IMAGE          CREATED       CREATED BY                                      SIZE      COMMENT
b46db85084b8   9 days ago    /bin/sh -c #(nop)  CMD ["nginx" "-g" "daemon...   0B        
<missing>      9 days ago    /bin/sh -c #(nop)  STOPSIGNAL SIGQUIT           0B        
<missing>      9 days ago    /bin/sh -c #(nop)  EXPOSE 80                    0B        
<missing>      9 days ago    /bin/sh -c #(nop)  ENTRYPOINT ["/docker-entr...   0B        
<missing>      9 days ago    /bin/sh -c #(nop) COPY file:09a214a3e07c919a...   4.61kB    
<missing>      9 days ago    /bin/sh -c #(nop) COPY file:0fd5fca330dcd6a7...   1.04kB    
<missing>      9 days ago    /bin/sh -c #(nop) COPY file:0b866ff3fc1ef5b0...   1.96kB    
<missing>      9 days ago    /bin/sh -c #(nop) COPY file:65504f71f5855ca0...   1.2kB     
<missing>      9 days ago    /bin/sh -c set -x     && addgroup -g 101 -S ...   17.6MB    
<missing>      9 days ago    /bin/sh -c #(nop)  ENV PKG_RELEASE=1            0B        
<missing>      9 days ago    /bin/sh -c #(nop)  ENV NJS_VERSION=0.7.0        0B        
<missing>      9 days ago    /bin/sh -c #(nop)  ENV NGINX_VERSION=1.21.4     0B        
<missing>      9 days ago    /bin/sh -c #(nop)  LABEL maintainer=NGINX Do...   0B        
<missing>      10 days ago   /bin/sh -c #(nop)  CMD ["/bin/sh"]              0B        
<missing>      10 days ago   /bin/sh -c #(nop) ADD file:762c899ec0505d1a3...   5.61MB

python:alpine mirror 45.5MB

# docker history python:alpine
IMAGE          CREATED       CREATED BY                                      SIZE      COMMENT
382a63bb2f25   10 days ago   /bin/sh -c #(nop)  CMD ["python3"]              0B        
<missing>      10 days ago   /bin/sh -c set -ex;   wget -O "$P...   8.31MB    
<missing>      10 days ago   /bin/sh -c #(nop)  ENV PYTHON_GET_PIP_SHA256...   0B        
<missing>      10 days ago   /bin/sh -c #(nop)  ENV PYTHON_GET_PIP_URL=ht...   0B        
<missing>      10 days ago   /bin/sh -c #(nop)  ENV PYTHON_SETUPTOOLS_VER...   0B        
<missing>      10 days ago   /bin/sh -c #(nop)  ENV PYTHON_PIP_VERSION=21...   0B        
<missing>      10 days ago   /bin/sh -c cd /usr/local/bin  && ln -s idle3...   32B       
<missing>      10 days ago   /bin/sh -c set -ex  && apk add --no-cache --...   29.8MB    
<missing>      10 days ago   /bin/sh -c #(nop)  ENV PYTHON_VERSION=3.10.0    0B        
<missing>      10 days ago   /bin/sh -c #(nop)  ENV GPG_KEY=A035C8C19219B...   0B        
<missing>      10 days ago   /bin/sh -c set -eux;  apk add --no-cache   c...   1.82MB    
<missing>      10 days ago   /bin/sh -c #(nop)  ENV LANG=C.UTF-8             0B        
<missing>      10 days ago   /bin/sh -c #(nop)  ENV PATH=/usr/local/bin:/...   0B        
<missing>      10 days ago   /bin/sh -c #(nop)  CMD ["/bin/sh"]              0B        
<missing>      10 days ago   /bin/sh -c #(nop) ADD file:762c899ec0505d1a3...   5.61MB

Actual Storage

# docker inspect nginx:alpine| jq '.[0]|{GraphDriver}'             
  "GraphDriver": {
    "Data": {
      "LowerDir": "/data/docker-overlay2/overlay2/3d.../diff:/data/docker-overlay2/overlay2/ae.../diff:/data/docker-overlay2/overlay2/ea.../diff:/data/docker-overlay2/overlay2/29.../diff:/data/docker-overlay2/overlay2/5e.../diff",
      "MergedDir": "/data/docker-overlay2/overlay2/b7.../merged",
      "UpperDir": "/data/docker-overlay2/overlay2/b7.../diff",
      "WorkDir": "/data/docker-overlay2/overlay2/b7.../work"
    "Name": "overlay2"

Description of the concept of hierarchy

Mirroring solves the problem of packaging applications and environments. Applications in practice are packaged and iterated based on the same rootfs, but not every rootfs has more than one. In fact, docker implements hierarchy using storage-driven AUFS, devicemapper, overlay, overlay2 storage technology

For example, if you look at a docker image above, you will see these layers

  • LowerDir: Mirror Layer

  • MergedDir: A view that integrates the lower and upper layers

  • UpperDir: Read and Write Layer

  • WorkDir: Middle layer, write to Upper layer, write to WorkDir first, move to UpperDir

3.1.2 Copy on write

When Docker first starts a container, the initial read-write layer is empty, and when the file system changes, these changes apply to it. For example, if you want to modify a file, it will first be copied from the read-only layer beneath the read-write layer to the read-write layer. Thus, the read-only version of the file still exists in the read-only layer, but is hidden by the copy of the file in the read-write layer, which is called write-time replication.

3.1.3 UnionFS

Mount the contents of multiple directories (also called branches) together into the same directory, where the physical locations are separate

For an intuitive effect, it is much faster to pull a nginx:1.15 image for the first time and a nginx:1.16 image for the second time

3.2 Program

Once you understand the main components of mirror size, it's easy to know in which direction to start reducing it

3.2.1 Reduce the number of mirroring layers

The increase in the number of mirroring layers is mainly due to the number of occurrences of RUN directives for Dockerfile, so merging RUN directives can significantly reduce the number of mirroring layers

Lift a chestnut:

Before merging, three layers

RUN apk add tzdata
RUN cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
RUN echo "Asia/Shanghai" > /etc/timezone

After merging, one level

RUN apk add tzdata \
    && cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime \
    && echo "Asia/Shanghai" > /etc/timezone

3.2.2 Reduce the size of each mirror layer Choose a smaller base image
  • Scratch: an empty mirror, also known as the father of mirrors! Any mirror needs a basic mirror, so the question comes, like whether there was a chicken or an egg first. What is the "ancestor" of the basic mirror? Can you build without any mirrors? The answer is yes. You can choose scratch instead of expanding it. For reference: baseimages , an example of using scratch mirrors pause
  • busybox: compared to scratch, more commonly used linux tools, etc.
  • alpine: more package management tools such as apk Multi-stage Construction

Multistage builds are well suited for compiled languages, simply by allowing multiple FROM directives to appear in a Dockerfile. Only the base mirror specified in the last FROM directive serves as the base mirror for this build image. Other stages can be considered intermediate steps only.

Combining FROM...AS...and COPY--from

For example, a java image with a mirror size of 812MB

FROM centos AS jdk
COPY jdk-8u231-linux-x64.tar.gz /usr/local/src
RUN cd /usr/local/src && \
    tar -xzvf jdk-8u231-linux-x64.tar.gz -C /usr/local

Using a multistage build with a mirror size of 618MB

FROM centos AS jdk
COPY jdk-8u231-linux-x64.tar.gz /usr/local/src
RUN cd /usr/local/src && \
    tar -xzvf jdk-8u231-linux-x64.tar.gz -C /usr/local

FROM centos
COPY --from=jdk /usr/local/jdk1.8.0_231 /usr/local Ignore Files

Build the context build context, meaning the context associated with the work you are doing now

The current working directory at docker build time. By default, files and directories in this context are sent to Docker Daemon as building context content, regardless of whether or not they are used in the current directory at build time.

When docker build starts executing, the console outputs Sending build context to Docker daemon xxxMB, which means that both the files and directories in the current working directory are used as the build context

As mentioned earlier, you can add--no-cache to an RUN directive that does not use caching, or you can add it when you execute the docker build command to avoid caching when you build a mirror

In a build context, using the.dockerignore file prevents local modules and debug logs from being copied into the Docker image at build time, much like git version-controlled.gitignore. Remote Download

Use remote download instead of ADD to reduce image size

RUN curl -s | tar -xC /opt/ Split COPY

For example, directory A of a COPY directive has four subdirectories AA/BB/CC/DD that are COPY, but only one BB is constantly changing

Splitting COPY will be faster at this time

Mount at build time

Mount on build ( Extended functionality)

To configure

  • Modify docker startup parameters and add--experimental
  • Dockerfile header add # syntax=docker/dockerfile:1.1.1-experimental


  • Mount local golang cache
# syntax = docker/dockerfile:experimental
FROM golang
RUN --mount=type=cache,target=/root/.cache/go-build go build ...
  • Mount cache directory
# syntax = docker/dockerfile:experimental
FROM ubuntu
RUN rm -f /etc/apt/apt.conf.d/docker-clean; echo 'Binary::apt::APT::Keep-Downloaded-Packages "true";' > /etc/apt/apt.conf.d/keep-cache
RUN --mount=type=cache,target=/var/cache/apt --mount=type=cache,target=/var/lib/apt \
  apt update && apt install -y gcc
  • Mount some credentials
# syntax = docker/dockerfile:experimental
FROM python:3
RUN pip install awscli
RUN --mount=type=secret,id=aws,target=/root/.aws/credentials aws s3 cp s3://... ...

Wait Post-build Cleanup
  • Delete Compressed Packet
  • Clean up installation cache
    • --no-cache
    • rm -rf /var/lib/apt/lists/*
    • rm -rf /var/cache/yum/* Mirror Compression

export and import combine to compress the image (the compression effect is not obvious)

The disadvantage of this method is that part of the mirror information will be lost

# docker run -d --name nginx nginx:alpine
# docker export nginx |docker import - nginx:alpine2
# docker history nginx:alpine
IMAGE          CREATED       CREATED BY                                      SIZE      COMMENT
b46db85084b8   10 days ago   /bin/sh -c #(nop)  CMD ["nginx" "-g" "daemon...   0B        
<missing>      10 days ago   /bin/sh -c #(nop)  STOPSIGNAL SIGQUIT           0B        
<missing>      10 days ago   /bin/sh -c #(nop)  EXPOSE 80                    0B        
<missing>      10 days ago   /bin/sh -c #(nop)  ENTRYPOINT ["/docker-entr...   0B        
<missing>      10 days ago   /bin/sh -c #(nop) COPY file:09a214a3e07c919a...   4.61kB    
<missing>      10 days ago   /bin/sh -c #(nop) COPY file:0fd5fca330dcd6a7...   1.04kB    
<missing>      10 days ago   /bin/sh -c #(nop) COPY file:0b866ff3fc1ef5b0...   1.96kB    
<missing>      10 days ago   /bin/sh -c #(nop) COPY file:65504f71f5855ca0...   1.2kB     
<missing>      10 days ago   /bin/sh -c set -x     && addgroup -g 101 -S ...   17.6MB    
<missing>      10 days ago   /bin/sh -c #(nop)  ENV PKG_RELEASE=1            0B        
<missing>      10 days ago   /bin/sh -c #(nop)  ENV NJS_VERSION=0.7.0        0B        
<missing>      10 days ago   /bin/sh -c #(nop)  ENV NGINX_VERSION=1.21.4     0B        
<missing>      10 days ago   /bin/sh -c #(nop)  LABEL maintainer=NGINX Do...   0B        
<missing>      10 days ago   /bin/sh -c #(nop)  CMD ["/bin/sh"]              0B        
<missing>      10 days ago   /bin/sh -c #(nop) ADD file:762c899ec0505d1a3...   5.61MB    
# docker history nginx:alpine2
dd6a3cf822ac   40 seconds ago                23MB      Imported from -
# docker images|grep nginx
nginx                                                                                                               alpine2                     dd6a3cf822ac   54 seconds ago   23MB
nginx                                                                                                               alpine                      b46db85084b8   10 days ago      23.2MB

3.3 Samples

3.3.1 go sample

Example 1

k8s cluster installed by kubeadm, Dockerfile mirrored by kube-apiserver was compiled using the bazel compilation tool

bazel build ...
LABEL maintainers=Kubernetes Authors
LABEL description=go based runner for distroless scenarios
COPY /workspace/go-runner . # buildkit
ENTRYPOINT ["/go-runner"]
COPY file:2e904ea733ba0ded2a99947847de31414a19d83f8495dd8c1fbed3c70bf67a22 in /usr/local/bin/kube-apiserver

Code directory 28M (containing.git directory 20.5M)

Mirror size 122 MB

Example 2

Dockerfile for Open Source Layout Engine Cadence


# Can be used in case a proxy is necessary

# Build tcheck binary
FROM golang:1.17-alpine3.13 AS tcheck

WORKDIR /go/src/

COPY go.* ./
RUN go build -mod=readonly -o /go/bin/tcheck

# Build Cadence binaries
FROM golang:1.17-alpine3.13 AS builder


RUN apk add --update --no-cache ca-certificates make git curl mercurial unzip

WORKDIR /cadence

# Making sure that dependency is not touched
ENV GOFLAGS="-mod=readonly"

# Copy go mod dependencies and build cache
COPY go.* ./
RUN go mod download

COPY . .
RUN rm -fr .bin .build


# bypass codegen, use committed files.  must be run separately, before building things.
RUN make .fake-codegen
RUN CGO_ENABLED=0 make copyright cadence-cassandra-tool cadence-sql-tool cadence cadence-server cadence-bench cadence-canary

# Download dockerize
FROM alpine:3.11 AS dockerize

RUN apk add --no-cache openssl

RUN wget$DOCKERIZE_VERSION/dockerize-alpine-linux-amd64-$DOCKERIZE_VERSION.tar.gz \
    && tar -C /usr/local/bin -xzvf dockerize-alpine-linux-amd64-$DOCKERIZE_VERSION.tar.gz \
    && rm dockerize-alpine-linux-amd64-$DOCKERIZE_VERSION.tar.gz \
    && echo "**** fix for host id mapping error ****" \
    && chown root:root /usr/local/bin/dockerize

# Alpine base image
FROM alpine:3.11 AS alpine

RUN apk add --update --no-cache ca-certificates tzdata bash curl

# set up nsswitch.conf for Go's "netgo" implementation
RUN test ! -e /etc/nsswitch.conf && echo 'hosts: files dns' > /etc/nsswitch.conf

SHELL ["/bin/bash", "-c"]

# Cadence server
FROM alpine AS cadence-server

ENV CADENCE_HOME /etc/cadence
RUN mkdir -p /etc/cadence

COPY --from=tcheck /go/bin/tcheck /usr/local/bin
COPY --from=dockerize /usr/local/bin/dockerize /usr/local/bin
COPY --from=builder /cadence/cadence-cassandra-tool /usr/local/bin
COPY --from=builder /cadence/cadence-sql-tool /usr/local/bin
COPY --from=builder /cadence/cadence /usr/local/bin
COPY --from=builder /cadence/cadence-server /usr/local/bin
COPY --from=builder /cadence/schema /etc/cadence/schema

COPY docker/ /
COPY config/dynamicconfig /etc/cadence/config/dynamicconfig
COPY config/credentials /etc/cadence/config/credentials
COPY docker/config_template.yaml /etc/cadence/config
COPY docker/ /

WORKDIR /etc/cadence

ENV SERVICES="history,matching,frontend,worker"

EXPOSE 7933 7934 7935 7939

# All-in-one Cadence server
FROM cadence-server AS cadence-auto-setup

RUN apk add --update --no-cache ca-certificates py-pip mysql-client
RUN pip install cqlsh

COPY docker/ /


# Cadence CLI
FROM alpine AS cadence-cli

COPY --from=tcheck /go/bin/tcheck /usr/local/bin
COPY --from=builder /cadence/cadence /usr/local/bin

ENTRYPOINT ["cadence"]

# Cadence Canary
FROM alpine AS cadence-canary

COPY --from=builder /cadence/cadence-canary /usr/local/bin
COPY --from=builder /cadence/cadence /usr/local/bin

CMD ["/usr/local/bin/cadence-canary", "--root", "/etc/cadence-canary", "start"]

# Cadence Bench
FROM alpine AS cadence-bench

COPY --from=builder /cadence/cadence-bench /usr/local/bin
COPY --from=builder /cadence/cadence /usr/local/bin

CMD ["/usr/local/bin/cadence-bench", "--root", "/etc/cadence-bench", "start"]

# Final image
FROM cadence-${TARGET}

Code directory 85.4M (including. git directory 57.7M)

Mirror size 135.69MB

3.3.2 py sample

FROM python:3.4

RUN apt-get update \
    && apt-get install -y --no-install-recommends \
        postgresql-client \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /usr/src/app
COPY requirements.txt ./
RUN pip install -r requirements.txt
COPY . .

CMD ["python", "", "runserver", ""]

Code directory 275M (contains.git directory 222M)

Mirror size 436MB

4. What else to do besides these optimizations

4.1 Set Character Set

Setting up a universal character set in a Dockerfile

# Set lang

4.2 Time Zone Correction

More on this issue can be found in my previous articles Multiple Postures for Container Time Problem in k8s Environment

Setting a common time zone in a Dockerfile

# Set timezone
RUN ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime \
		 && echo "Asia/Shanghai" > /etc/timezone

4.3 Process Management

When a docker container is running, it defaults to ENTRYPOINT or CMD in the Dockerfile as the main process with PID 1. This process exists for the purpose of "tamping" the container. Once the process does not exist, the container exits.

In addition, an important role for this main process is to manage the "zombie process"

A more official definition of a "zombie process" is a process that completes execution (caused by an exit system call, or a fatal error at run time or a termination signal) but still has its process control block in the operating system's process tables and is in a "terminated state".

The main ideas to clean up the zombie process are

  • Set SIGCHLD signal processing function in parent process to SIG_IGN (Ignore Signal);
  • fork twice and kills the first-level child process, making the second-level child process an orphan process and being "adopted" and cleaned up by init

Open source solutions currently available

  • Tini
    The tini container init is a minimal init system that runs inside the container to start a subprocess and clean up zombies and perform signal forwarding while waiting for the process to exit


    • tini prevents application generation zombie processes

    • TiNi handles signals from programs running in the Docker process, and through Tini, SIGTERM terminates the process without requiring you to explicitly install a signal processor


# Add Tini
ADD${TINI_VERSION}/tini /tini
RUN chmod +x /tini
ENTRYPOINT ["/tini", "--"]

# Run your program under Tini
CMD ["/your/program", "-and", "-its", "arguments"]
# or docker run your-image /your/program ...
  • dumb-init

    dumb-init sends the signal it receives to the process group of the child process. For example, when bash receives a signal, it does not send a signal to the child process

    dumb-init can also be set by setting the environment variable DUMB_INIT_SETSID=0 to control signaling only to its direct subprocesses

    In addition, dumb-init will take over the process that lost its parent to ensure it exits normally


FROM alpine:3.11.5
RUN sed -i "s/" /etc/apk/repositories \
    && apk add --no-cache dumb-init

# Runs "/usr/bin/dumb-init -- /my/script --with --args"
ENTRYPOINT ["dumb-init", "--"]

# or if you use --rewrite or other cli flags
# ENTRYPOINT ["dumb-init", "--rewrite", "2:3", "--"]

CMD ["/my/script", "--with", "--args"]

4.4 Degradation Start

In many cases, processes in containers need to be started with reduced privileges to ensure security, which is the same as running a nginx service on a vm, and is best run with a specific reduced privilege user

Examples, tomcat mirroring

USER tomcat
WORKDIR /usr/local/tomcat
ENTRYPOINT ["","run"]

If sudo privileges are required in some cases, avoid installing or using sudo officially in docker because it has unpredictable TTY and signal forwarding behavior that may cause problems. gosu is recommended if you must, for example, initialize the daemon to root but treat it as a non-running root

For example, Official mirror of Postgres Use the following script as its ENTRYPOINT

set -e

if [ "$1" = 'postgres' ]; then
    chown -R postgres "$PGDATA"

    if [ -z "$(ls -A "$PGDATA")" ]; then
        gosu postgres initdb

    exec gosu postgres "$@"

exec "$@"

4.5 Bottom Library Dependency

Many times, services rely on the support of some underlying libraries, where a java mirror based on an alpine base image is built to hold a chestnut

alpine does not install much of the commonly used software in order to simplify itself, so glibc is required to use jdk/jre and glibc requires a ca-certificates certificate service (installing glibc pre-dependencies) before installing

Running jdk8 mirror with alpine found that JDK could not be executed. The reason is that java is based on GUN Standard C library(glibc) and alpine is based on MUSL libc(mini libc), so Alpine needs to install the library of glibc

5. Summary

This paper briefly analyses several main reasons why Dockerfile is so large. Based on the production experience, it lists some measures to optimize the size of the mirror and other commonly used treatment methods. Many technical contents are quite cluttered and not mentioned all ~

Reference resources

See you ~

Pay attention to the public numbers and share more original dry goods with you~

Posted by pskZero7 on Wed, 24 Nov 2021 13:14:58 -0800