Docker build container build acceleration strategy

Keywords: pip Docker jupyter Anaconda

It takes a lot of time to download many kinds of software when building containers. hub.docker.com is slow in nature, especially when it encounters modules stored on gcr.io/aws and so on. pip installation of python module is also slow, and the download of conda is like a snail.

There are several ways to speed up the download of container construction:

1. Put it on the "external server" to build, and then transfer it to aliyun and other images. The download speed will be much faster.

2. Add proxy, pip and conda images. The following is a single user image built for jupyterhub.

# Copyright (c) Jupyter Development Team.
# Distributed under the terms of the Modified BSD License.
FROM jupyter/all-spark-notebook:5811dcb711ba

LABEL maintainer="Databook Project,https://github.com/databooks<openthings@163.com>"

USER root

# ====================================================================
# Add proxy, using --build-arg "HTTP_PROXY=http://192.168.199.99:9999"

ENV HTTP_PROXY ${HTTP_PROXY}
ENV HTTPS_PROXY ${HTTP_PROXY}
ENV http_proxy ${HTTP_PROXY}
ENV https_proxy ${HTTP_PROXY}

#Add conda install mirror:

RUN echo $http_proxy && \
    conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/ && \
    conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/ && \
    conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge/ && \
    conda config --set show_channel_urls yes

#Add pip install mirror:

RUN echo "[global] \
index-url = http://pypi.tuna.tsinghua/simple \
trusted-host = \
    pypi.tuna.tsinghua \
timeout = 120 \
" > /etc/pip.conf
# ====================================================================

# ====================================================================
USER $NB_UID

RUN pip install --upgrade pip 
RUN pip install bs4 && \
    pip install lxml && \
    pip install ipyleaflet && \
    pip install py4j && \
    pip install pyspark && \
    pip install mlflow && \
    pip install airflow && \
    pip install tushare

RUN conda update -n base conda
RUN conda install -y -c conda-forge nodejs=8.10.0 && \
    conda install -y -c conda-forge tensorflow=1.8.0 && \
    jupyter labextension install jupyter-leaflet

# ====================================================================
ENV HTTP_PROXY ""
ENV HTTPS_PROXY ""
ENV http_proxy ""
ENV https_proxy ""
# ====================================================================

Be careful:

  • Note that it should be added to docker. It is useless to set the host, because docker build runs in a separate container.
  • When building, use docker build -- build Arg "http_proxy = http://192.168.199.99:9999" to pass in the proxy parameter, and at the end, clear the http_proxy and other parameters. This avoids writing proxy information to the mirror.
  • Here, we use the pip and conda images of Tsinghua University, which can be replaced by others. Refer to the following information:

More references: Databook data book

Posted by safrica on Thu, 30 Jan 2020 07:19:13 -0800