Hadoop is fully distributed

Keywords: Hadoop Distribution hdfs

In fact, the construction methods of fully distributed and pseudo distributed are basically the same. Only individual files need to be changed. Let's start now!

The first step is to configure the IP address, host name, and the mapping between host and IP address. The tutorial is included in my previous pseudo distributed building, which will not be described here.

Note: Hadoop is completely distributed, and there are two master-slave nodes. There is one master node. Here, I name the host name of the master node as master, and there are two slave nodes, which are respectively defined as slave1 and slave2

The second step is to turn off the firewall. (remember to turn off all three virtual machines! Configure the IP address, host name, and the mapping between the host and IP address. PS: you can configure one first, and then copy it to the other two devices using the scp command.)

Step 3: extract the Hadoop and jdk files to the specified folder, which is the same as the pseudo distributed operation. (completed on the master device)

#Extract to the specified directory
tar -zxvf /root/hadoop-2.7.3.tar.gz -C /opt 
tar -zxvf /root/jdk1.8.0_144.tar.gz -C /opt
 
#see
ll /opt

Step 4: SSH secret free link (complete on the master device!!!):

# 1. Generate key pair
ssh-keygen -t rsa
 
# 2. Export and pass the public key file to the local machine. Pay attention to the path of the public key file. The operation requires entering the password of root user
ssh-copy-id -i /root/.ssh/id_rsa.pub master

# 3. Export and pass the public key file to two slave nodes. Pay attention to the path of the public key file. The operation requires entering the password of root user
ssh-copy-id -i /root/.ssh/id_rsa.pub slave1
ssh-copy-id -i /root/.ssh/id_rsa.pub slave2
 
# 4. Check and connect the machine with ssh command (no password required)
ssh master

ssh slave1 
#After entering slave1, you need to exit first, and then verify slave2
ssh slave2

Step 5: modify the environment variable file and verify whether the configuration is successful (the file is in / etc/profile) (completed on the master node):

# Add at the end of the file
export JAVA_HOME=/opt/jdk1.8.0_144 #Configuring the jdk environment
export HADOOP_HOME=/opt/hadoop-2.6.0 #Configure hadoop environment PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
# Save exit
 
 
#Refresh file
source /etc/profile
 
# Check whether hadoop and java are effective. If the version number appears, the configuration is successful (if the command is not found, you need to check the / etc/profile file again!!!)
 
hadoop version 
java -version
 
 

Step 9: (different from distributed) copy the / etc/profile file to slave1 and slave2 nodes and make it effective (remember to source the profile file after copying it!!!):

scp /etc/profile root@slave1:/etc/   #You may need to enter the password for slave1
scp /etc/profile root@slave2:/etc/

#Remember to source the profile file after copying!!!

Step 10: modify hadoop related configuration files: (one more file than pseudo distributed) (the following operations are completed on the master device!!!):

        (1) , core-site.xml file: (insert in < configuration > < / configuration > tag)

<property>
        <name>fs.defaultFS</name>
        <value>hdfs://Master: 9000 < / value > #master: hostname
</property>
<property>
        <name>hadoop.tmp.dir</name>
        <value>/opt/hadoop-2.6.0/tmp</value> #/opt/hadoop-2.6.0: Hadoop decompression path
</property>

        (2) , hadoop-env.sh file:

export JAVA_HOME=/opt/jdk1.8.0_144  #jdk decompression path
 

        (3) , hdfs-site.xml file: (insert in < configuration > < / configuration > tag)

	<property>
		<name>dfs.namenode.name.dir</name>
		<value>/opt/hadoop-2.6.0/dfs/name</value>  #/opt/hadoop-2.6.0: Hadoop decompression path
	</property>
	<property>
		<name>dfs.datanode.data.dir</name>
		<value>/opt/hadoop-2.6.0/dfs/data</value>  #/opt/hadoop-2.6.0: Hadoop decompression path
	</property>
	<property>
		<name>dfs.namenode.checkpoint.dir</name>
		<value>/opt/hadoop-2.6.0/dfs/namesecondary</value>  #/opt/hadoop-2.6.0: Hadoop decompression path
	</property>
	
	<property>
		<name>dfs.namenode.secondary.http-address</name>
		<value>master:50090</value> #master: host name
	</property>
	 <property>
                <name>dfs.replication</name>
                <value>1</value> #Because it is a master-slave distribution structure, write 1 here
     </property>

         (4) Mapred-site.xml file (mapred-site.xml file does not exist by default, and is copied from the template first) (insert it in the < configuration > < / configuration > tag):

# Copy and rename
cp mapred-site.xml.template mapred-site.xml
 
#configuration information
<property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
</property>
 
<property>
    <name>mapreduce.jobhistory.address</name>
    <value>master:10020</value> #master: host name
</property>
 
 
<property>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>master:19888</value> #master: host name
</property>

         (5) . yarn-site.xml file: set the host name of the yarn resource manager and set the auxiliary functions of the yarn Node Manager (insert in the < configuration > < / configuration > tag):

<property>
     <name>yarn.resourcemanager.hostname</name>
     <value>master</value> #master: host name
</property>
<property>
     <name>yarn.nodemanager.aux-services</name>
     <value>mapreduce_shuffle</value>
</property>

        (6) . slave files: (different from pseudo distributed)

#Add the host names of two slave nodes to the slave file
slave1
slave2

Step 11: copy files to slave1 and slave2 nodes:

scp -r /opt/hadoop-2.6.0 root@slave1:/opt/  #/opt /: the decompression path for Hadoop

scp -r /opt/hadoop-2.6.0 root@slave2:/opt/

scp -r /opt/jdk1.8.0_144 root@slave1:/opt/  #/opt /: decompression path for jdk

scp -r /opt/jdk1.8.0_144 root@slave1:/opt/

Step 12: Format: (Note: the format operation cannot be repeated. If it needs to be repeated, the contents in / opt/hadoop/dfs directory must be cleared before formatting)

hadoop namenode -format

Step 13: start hadoop and view the process:

          Master node:

# Start hadoop
start-all.sh
 
#jps viewing process:
jps
 
#The following process shows that hadoop is successfully built

1875 SecondaryNameNode
2724 Jps
2167 ResourceManager
1544 NameNode
#(the process number can be different, but the process number cannot be less)

        slave1:

#(the process number can be different, but the process number cannot be less)
1908 Jps
1763 NodeManager
1669 DataNode

        slave2:

#(the process number can be different, but the process number cannot be less)
1888 Jps
1737 NodeManager
1623 DataNode

Posted by smudge on Sun, 31 Oct 2021 12:30:57 -0700