Using docker to install Hadoop and Spark

Keywords: Big Data Spark Hadoop Docker curl

Using docker configuration to install hadoop and spark

Install hadoop and spark images respectively

Install hadoop image

docker selected Mirror Address , the version of hadoop provided by this image is relatively new, and jdk8 is installed, which can support the installation of the latest version of spark.

docker pull uhopper/hadoop:2.8.1

Install spark image

If the requirements for spark version are not very high, you can pull other people's images directly. If a new version is required, you need to configure the dockerfile.

Environmental preparation

Download the sequenceiq/spark image building source code
```
git clone https://github.com/sequenceiq/docker-spark
```
Download Spark 2.3.2 installation package from Spark official website
- Download address: http://spark.apache.org/downloads.html
The downloaded files need to be placed in the docker spark directory
Check the local image to make sure hadoop is installed
Enter the docker spark directory and confirm that all the files for image building are ready

Modify profile

Modify the Dockerfile to the following

FROM sequenceiq/hadoop-docker:2.7.0
MAINTAINER scottdyt

#support for Hadoop 2.7.0
#RUN curl -s http://d3kbcqa49mib13.cloudfront.net/spark-1.6.1-bin-hadoop2.6.tgz | tar -xz -C /usr/local/
ADD spark-2.3.2-bin-hadoop2.7.tgz /usr/local/
RUN cd /usr/local && ln -s spark-2.3.2-bin-hadoop2.7 spark
ENV SPARK_HOME /usr/local/spark
RUN mkdir $SPARK_HOME/yarn-remote-client
ADD yarn-remote-client $SPARK_HOME/yarn-remote-client

RUN $BOOTSTRAP && $HADOOP_PREFIX/bin/hadoop dfsadmin -safemode leave && $HADOOP_PREFIX/bin/hdfs dfs -put $SPARK_HOME-2.3.2-bin-hadoop2.7/jars /spark && $HADOOP_PREFIX/bin/hdfs dfs -put $SPARK_HOME-2.3.2-bin-hadoop2.7/examples/jars /spark 


ENV YARN_CONF_DIR $HADOOP_PREFIX/etc/hadoop
ENV PATH $PATH:$SPARK_HOME/bin:$HADOOP_PREFIX/bin
# update boot script
COPY bootstrap.sh /etc/bootstrap.sh
RUN chown root.root /etc/bootstrap.sh
RUN chmod 700 /etc/bootstrap.sh

#install R 
RUN rpm -ivh http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
RUN yum -y install R

ENTRYPOINT ["/etc/bootstrap.sh"]

Modify bootstrap.sh to

#!/bin/bash

: ${HADOOP_PREFIX:=/usr/local/hadoop}

$HADOOP_PREFIX/etc/hadoop/hadoop-env.sh

rm /tmp/*.pid

# installing libraries if any - (resource urls added comma separated to the ACP system variable)
cd $HADOOP_PREFIX/share/hadoop/common ; for cp in ${ACP//,/ }; do  echo == $cp; curl -LO $cp ; done; cd -

# altering the core-site configuration
sed s/HOSTNAME/$HOSTNAME/ /usr/local/hadoop/etc/hadoop/core-site.xml.template > /usr/local/hadoop/etc/hadoop/core-site.xml

# setting spark defaults
echo spark.yarn.jar hdfs:///spark/* > $SPARK_HOME/conf/spark-defaults.conf
cp $SPARK_HOME/conf/metrics.properties.template $SPARK_HOME/conf/metrics.properties

service sshd start
$HADOOP_PREFIX/sbin/start-dfs.sh
$HADOOP_PREFIX/sbin/start-yarn.sh



CMD=${1:-"exit 0"}
if [[ "$CMD" == "-d" ]];
then
	service sshd stop
	/usr/sbin/sshd -D -d
else
	/bin/bash -c "$*"
fi

Constructing mirrors

docker build --rm -t scottdyt/spark:2.3.2 .

View mirroring

Start a spark 2.3.1 container

docker run -it -p 8088:8088 -p 8042:8042 -p 4040:4040 -h sandbox scottdyt/spark:2.3.2 bash

Start successful:

Install spark Hadoop image

If you want to be lazy, install the image of spark and hadoop directly. The image address is Here.

Or input directly at the terminal:

docker pull uhopper/hadoop-spark:2.1.2_2.8.1

Installation complete:

Reference resources

https://blog.csdn.net/farawayzheng_necas/article/details/54341036

Posted by shantred on Tue, 10 Dec 2019 22:46:53 -0800

Programmer Group