It takes a lot of time to download many kinds of software when building containers. hub.docker.com is slow in nature, especially when it encounters modules stored on gcr.io/aws and so on. pip installation of python module is also slow, and the download of conda is like a snail.
There are several ways to speed up the download of container construction:
1. Put it on the "external server" to build, and then transfer it to aliyun and other images. The download speed will be much faster.
- The steps can refer to: Create kubernetes-1.11.0 image service (high speed) in Alibaba cloud
- If the system disk is not enough, refer to: How to add data disk to Docker of container service
2. Add proxy, pip and conda images. The following is a single user image built for jupyterhub.
# Copyright (c) Jupyter Development Team. # Distributed under the terms of the Modified BSD License. FROM jupyter/all-spark-notebook:5811dcb711ba LABEL maintainer="Databook Project,https://github.com/databooks<openthings@163.com>" USER root # ==================================================================== # Add proxy, using --build-arg "HTTP_PROXY=http://192.168.199.99:9999" ENV HTTP_PROXY ${HTTP_PROXY} ENV HTTPS_PROXY ${HTTP_PROXY} ENV http_proxy ${HTTP_PROXY} ENV https_proxy ${HTTP_PROXY} #Add conda install mirror: RUN echo $http_proxy && \ conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/ && \ conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main/ && \ conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge/ && \ conda config --set show_channel_urls yes #Add pip install mirror: RUN echo "[global] \ index-url = http://pypi.tuna.tsinghua/simple \ trusted-host = \ pypi.tuna.tsinghua \ timeout = 120 \ " > /etc/pip.conf # ==================================================================== # ==================================================================== USER $NB_UID RUN pip install --upgrade pip RUN pip install bs4 && \ pip install lxml && \ pip install ipyleaflet && \ pip install py4j && \ pip install pyspark && \ pip install mlflow && \ pip install airflow && \ pip install tushare RUN conda update -n base conda RUN conda install -y -c conda-forge nodejs=8.10.0 && \ conda install -y -c conda-forge tensorflow=1.8.0 && \ jupyter labextension install jupyter-leaflet # ==================================================================== ENV HTTP_PROXY "" ENV HTTPS_PROXY "" ENV http_proxy "" ENV https_proxy "" # ====================================================================
Be careful:
- Note that it should be added to docker. It is useless to set the host, because docker build runs in a separate container.
- When building, use docker build -- build Arg "http_proxy = http://192.168.199.99:9999" to pass in the proxy parameter, and at the end, clear the http_proxy and other parameters. This avoids writing proxy information to the mirror.
- Here, we use the pip and conda images of Tsinghua University, which can be replaced by others. Refer to the following information:
More references: Databook data book