In fact, the construction methods of fully distributed and pseudo distributed are basically the same. Only individual files need to be changed. Let's start now!
The first step is to configure the IP address, host name, and the mapping between host and IP address. The tutorial is included in my previous pseudo distributed building, which will not be described here.
Note: Hadoop is completely distributed, and there are two master-slave nodes. There is one master node. Here, I name the host name of the master node as master, and there are two slave nodes, which are respectively defined as slave1 and slave2
The second step is to turn off the firewall. (remember to turn off all three virtual machines! Configure the IP address, host name, and the mapping between the host and IP address. PS: you can configure one first, and then copy it to the other two devices using the scp command.)
Step 3: extract the Hadoop and jdk files to the specified folder, which is the same as the pseudo distributed operation. (completed on the master device)
#Extract to the specified directory tar -zxvf /root/hadoop-2.7.3.tar.gz -C /opt tar -zxvf /root/jdk1.8.0_144.tar.gz -C /opt #see ll /opt
Step 4: SSH secret free link (complete on the master device!!!):
# 1. Generate key pair ssh-keygen -t rsa # 2. Export and pass the public key file to the local machine. Pay attention to the path of the public key file. The operation requires entering the password of root user ssh-copy-id -i /root/.ssh/id_rsa.pub master # 3. Export and pass the public key file to two slave nodes. Pay attention to the path of the public key file. The operation requires entering the password of root user ssh-copy-id -i /root/.ssh/id_rsa.pub slave1 ssh-copy-id -i /root/.ssh/id_rsa.pub slave2 # 4. Check and connect the machine with ssh command (no password required) ssh master ssh slave1 #After entering slave1, you need to exit first, and then verify slave2 ssh slave2
Step 5: modify the environment variable file and verify whether the configuration is successful (the file is in / etc/profile) (completed on the master node):
# Add at the end of the file export JAVA_HOME=/opt/jdk1.8.0_144 #Configuring the jdk environment export HADOOP_HOME=/opt/hadoop-2.6.0 #Configure hadoop environment PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH # Save exit #Refresh file source /etc/profile # Check whether hadoop and java are effective. If the version number appears, the configuration is successful (if the command is not found, you need to check the / etc/profile file again!!!) hadoop version java -version
Step 9: (different from distributed) copy the / etc/profile file to slave1 and slave2 nodes and make it effective (remember to source the profile file after copying it!!!):
scp /etc/profile root@slave1:/etc/ #You may need to enter the password for slave1 scp /etc/profile root@slave2:/etc/ #Remember to source the profile file after copying!!!
Step 10: modify hadoop related configuration files: (one more file than pseudo distributed) (the following operations are completed on the master device!!!):
(1) , core-site.xml file: (insert in < configuration > < / configuration > tag)
<property> <name>fs.defaultFS</name> <value>hdfs://Master: 9000 < / value > #master: hostname </property> <property> <name>hadoop.tmp.dir</name> <value>/opt/hadoop-2.6.0/tmp</value> #/opt/hadoop-2.6.0: Hadoop decompression path </property>
(2) , hadoop-env.sh file:
export JAVA_HOME=/opt/jdk1.8.0_144 #jdk decompression path
(3) , hdfs-site.xml file: (insert in < configuration > < / configuration > tag)
<property> <name>dfs.namenode.name.dir</name> <value>/opt/hadoop-2.6.0/dfs/name</value> #/opt/hadoop-2.6.0: Hadoop decompression path </property> <property> <name>dfs.datanode.data.dir</name> <value>/opt/hadoop-2.6.0/dfs/data</value> #/opt/hadoop-2.6.0: Hadoop decompression path </property> <property> <name>dfs.namenode.checkpoint.dir</name> <value>/opt/hadoop-2.6.0/dfs/namesecondary</value> #/opt/hadoop-2.6.0: Hadoop decompression path </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>master:50090</value> #master: host name </property> <property> <name>dfs.replication</name> <value>1</value> #Because it is a master-slave distribution structure, write 1 here </property>
(4) Mapred-site.xml file (mapred-site.xml file does not exist by default, and is copied from the template first) (insert it in the < configuration > < / configuration > tag):
# Copy and rename cp mapred-site.xml.template mapred-site.xml #configuration information <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>master:10020</value> #master: host name </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>master:19888</value> #master: host name </property>
(5) . yarn-site.xml file: set the host name of the yarn resource manager and set the auxiliary functions of the yarn Node Manager (insert in the < configuration > < / configuration > tag):
<property> <name>yarn.resourcemanager.hostname</name> <value>master</value> #master: host name </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property>
(6) . slave files: (different from pseudo distributed)
#Add the host names of two slave nodes to the slave file slave1 slave2
Step 11: copy files to slave1 and slave2 nodes:
scp -r /opt/hadoop-2.6.0 root@slave1:/opt/ #/opt /: the decompression path for Hadoop scp -r /opt/hadoop-2.6.0 root@slave2:/opt/ scp -r /opt/jdk1.8.0_144 root@slave1:/opt/ #/opt /: decompression path for jdk scp -r /opt/jdk1.8.0_144 root@slave1:/opt/
Step 12: Format: (Note: the format operation cannot be repeated. If it needs to be repeated, the contents in / opt/hadoop/dfs directory must be cleared before formatting)
hadoop namenode -format
Step 13: start hadoop and view the process:
Master node:
# Start hadoop start-all.sh #jps viewing process: jps #The following process shows that hadoop is successfully built 1875 SecondaryNameNode 2724 Jps 2167 ResourceManager 1544 NameNode #(the process number can be different, but the process number cannot be less)
slave1:
#(the process number can be different, but the process number cannot be less) 1908 Jps 1763 NodeManager 1669 DataNode
slave2:
#(the process number can be different, but the process number cannot be less) 1888 Jps 1737 NodeManager 1623 DataNode