Fully Distributed Cluster of Hadoop

Cluster environment:

centOs6.8: hadoop102，hadoop103，hadoop104
JDK version: jdk1.8.0_144
hadoop version: Hadoop 2.7.2

First, prepare three clients (hadoop 102, Hadoop 103, Hadoop 104), close the firewall, and modify them to static ip and ip address mapping.

Configuration cluster

Writing Cluster Distribution Scripts

Create a remote synchronization script xsync and place it in the new bin directory under the current user and configure it in PATH so that the script can be executed in any directory
Script implementation

[kocdaniel@hadoop102 ~]$ mkdir bin
[kocdaniel@hadoop102 ~]$ cd bin/
[kocdaniel@hadoop102 bin]$ vim xsync

Write the following script code in the file

#!/bin/bash
#1 Get the number of input parameters, if no parameters, exit directly
pcount=$#
if((pcount==0)); then
echo no args;
exit;
fi

#2 Get the file name
p1=$1
fname=`basename $p1`
echo fname=$fname

#3 Get the directory to the absolute path - P points to the actual physical address to prevent soft connection
pdir=`cd -P $(dirname $p1); pwd`
echo pdir=$pdir

#4 Get the current user name
user=`whoami`

#5 cycle
for((host=103; host<105; host++)); do
        echo ------------------- hadoop$host --------------
        rsync -rvl $pdir/$fname $user@hadoop$host:$pdir
done

Modify the script xsync with execution authority and call the script to copy the script to 103 and 104 nodes

[kocdaniel@hadoop102 bin]$ chmod 777 xsync
[kocdaniel@hadoop102 bin]$ xsync /home/atguigu/bin

Cluster configuration

Cluster deployment planning

	hadoop102	hadoop103	hadoop104
HDFS	NameNode DataNode	DataNode	SecondaryNameNode DataNode
YARN	NodeManager	ResourceManager NodeManager	NodeManager

Due to limited computer configuration, only three virtual machines can be used. Clusters are planned according to needs in the working environment.

Configuration cluster

Switch to the Hadoop installation directory / etc/hadoop/

Configure core-site.xml

[kocdaniel@hadoop102 hadoop]$ vim core-site.xml
# Write the following in the file
<!-- Appoint HDFS in NameNode Address -->
<property>
    <name>fs.defaultFS</name>
      <value>hdfs://hadoop102:9000</value>
</property>

<!-- Appoint Hadoop Storage directory of files generated at runtime -->
<property>
        <name>hadoop.tmp.dir</name>
        <value>/opt/module/hadoop-2.7.2/data/tmp</value>
</property>

HDFS configuration file

Configure hadoop-env.sh

[kocdaniel@hadoop102 hadoop]$ vim hadoop-env.sh
export JAVA_HOME=/opt/module/jdk1.8.0_144

Note: We have configured JAVA_HOME in the / etc/profile file. Why do we need to configure JAVA_HOME here?

Answer: Because Hadoop runs as a daemon( A daemon is a process that runs in the background and is not controlled by any terminal. From Baidu Encyclopedia ) It is precisely because it runs in the background and does not accept any terminal control, so it can not read the environment variables we have configured, so we need to configure them separately here.

Configure hdfs-site.xml

[kocdaniel@hadoop102 hadoop]$ vim hdfs-site.xml
# Write the following configuration
<!-- The number of configuration copies is 3, and the default is 3, so this can also be deleted. -->
<property>
        <name>dfs.replication</name>
        <value>3</value>
</property>

<!-- Appoint Hadoop Auxiliary Name Node Host Configuration -->
<property>
      <name>dfs.namenode.secondary.http-address</name>
      <value>hadoop104:50090</value>
</property>

YARN Profile

Configure yarn-env.sh

[kocdaniel@hadoop102 hadoop]$ vim yarn-env.sh
export JAVA_HOME=/opt/module/jdk1.8.0_144

Configure yarn-site.xml

[kocdaniel@hadoop102 hadoop]$ vi yarn-site.xml
# Add the following configuration
<!-- Reducer How to get data -->
<property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
</property>

<!-- Appoint YARN Of ResourceManager Address -->
<property>
        <name>yarn.resourcemanager.hostname</name>
        <value>hadoop103</value>
</property>

MapReduce configuration file

Configure mapred-env.sh

[kocdaniel@hadoop102 hadoop]$ vim mapred-env.sh
export JAVA_HOME=/opt/module/jdk1.8.0_144

Configure mapred-site.xml

# For the first configuration, you need to rename mapred-site.xml.template to mapred-site.xml
[kocdaniel@hadoop102 hadoop]$ cp mapred-site.xml.template mapred-site.xml
[kocdaniel@hadoop102 hadoop]$ vim mapred-site.xml
# Add the following configuration to the file
<!-- Appoint MR Run in Yarn upper -->
<property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
</property>

Synchronize the configured files to hadoop103 and hadoop104 nodes using cluster distribution scripts

[kocdaniel@hadoop102 hadoop]$ xsync /opt/module/hadoop-2.7.2/

It's better to check the synchronization results after the synchronization is completed to avoid errors.

Single point start

If it's the first time you start, you need to format the namenode, otherwise skip this step

[kocdaniel@hadoop102 hadoop-2.7.2]$ hadoop namenode -format

Formatting should pay attention to:
- Only the first startup needs to be formatted, and do not always format later. Otherwise, inconsistent cluster IDS between namenode and datanode will occur, which will lead to the failure of datanode startup.
- Correct formatting posture:
  - When the first formatting is performed, the data folder is generated in the hadoop installation directory, which generates the information of the namenode.
  - After starting namenode and datanode, log folders for logs will also be generated in the same directory
  - So before formatting, you need to delete these two folders, then format them, and finally start the namenode and datanode.

Start namenode on hadoop102

[kocdaniel@hadoop102 hadoop-2.7.2]$ hadoop-daemon.sh start namenode
[kocdaniel@hadoop102 hadoop-2.7.2]$ jps
3461 NameNode

Start DataNode on Hadoop 102, Hadoop 103 and Hadoop 104, respectively

[kocdaniel@hadoop102 hadoop-2.7.2]$ hadoop-daemon.sh start datanode
[kocdaniel@hadoop102 hadoop-2.7.2]$ jps
3461 NameNode
3608 Jps
3561 DataNode
[kocdaniel@hadoop103 hadoop-2.7.2]$ hadoop-daemon.sh start datanode
[kocdaniel@hadoop103 hadoop-2.7.2]$ jps
3190 DataNode
3279 Jps
[kocdaniel@hadoop104 hadoop-2.7.2]$ hadoop-daemon.sh start datanode
[kocdaniel@hadoop104 hadoop-2.7.2]$ jps
3237 Jps
3163 DataNode

Visithadoop102:50070 View results

But there is a problem with the above single-point startup:
- Every time one node starts, what if the number of nodes increases to 1000?

Configure ssh Secret-Free Logon

Configure ssh
- ssh ip of another node can be switched to another machine, but you have to enter a password
Secret-free ssh configuration
- Principle of Secret-Free Login

Generate private and public keys on the host hadoop102 that configures namenode
- Switch directory to / home / username /. ssh/
```
[kocdaniel@hadoop102 .ssh]$ ssh-keygen -t rsa
```

 - Then click (three carriages return) and two files will be generated. id_rsa(Private key, id_rsa.pub(Public key)
 - Copy the public key to the target machine to be secret-free login

 ```shell
 [kocdaniel@hadoop102 .ssh]$ ssh-copy-id hadoop103
 [kocdaniel@hadoop102 .ssh]$ ssh-copy-id hadoop104
 # Note: ssh access itself needs to enter a password, so we need to copy the public key to 102 as well.
 [kocdaniel@hadoop102 .ssh]$ ssh-copy-id hadoop102
 
 ```

Similarly, do the same on the host hadoop103 that configures resource manager, and then cluster.

Clustering

Configure slaves

Switch directory to: Hadoop installation directory / etc/hadoop/
Add the following to the slaves file in the directory

[kocdaniel@hadoop102 hadoop]$ vim slaves
# Note that there should be no blanks at the end and no blank lines in the file.
hadoop102
hadoop103
hadoop104

Synchronize configuration files for all nodes

[kocdaniel@hadoop102 hadoop]$ xsync slaves

Start cluster

Again, if it's the first boot, it needs to be formatted
Start HDFS

[kocdaniel@hadoop102 hadoop-2.7.2]$ sbin/start-dfs.sh

# View startup results, consistent with cluster planning (configuration file)
[atguigu@hadoop102 hadoop-2.7.2]$ jps
4166 NameNode
4482 Jps
4263 DataNode

[atguigu@hadoop103 hadoop-2.7.2]$ jps
3218 DataNode
3288 Jps

[atguigu@hadoop104 hadoop-2.7.2]$ jps
3221 DataNode
3283 SecondaryNameNode
3364 Jps

Start YARN

# Note: If NameNode and ResourceManger are not the same machine, YARN cannot be started on NameNode, and YARN should be started on the machine where Resouce Manager is located.
[kocdaniel@hadoop103 hadoop-2.7.2]$ sbin/start-yarn.sh

Viewing related information on the web side

Posted by mcbeckel on Wed, 25 Sep 2019 04:17:21 -0700

Programmer Group

Fully Distributed Cluster of Hadoop

Configuration cluster

Single point start

Configure ssh Secret-Free Logon

Clustering

Hot Keywords