Configuration and startup of Hadoop pseudo distributed environment

Keywords: Big Data Hadoop xml Linux NodeManager

1. Environmental preparation

On a linux machine, install the hadoop running environment. For the installation method, see: Establishment of HADOOP operation environment

2. Start HDFS and run MapReduce

2.1. Configure cluster

1. Configuration: hadoop-env.sh

Get the installation path of JDK in Linux system:

[root@ hadoop101 ~]# echo $JAVA_HOME
/opt/module/jdk1.8.0_144

Edit hadoop-env.sh, and modify the JAVA_HOME path in hadoop-env.sh:

export JAVA_HOME=/opt/module/jdk1.8.0_144

2. Configuration: core-site.xml

First of all, the access directory of files cannot stay locally. To use files on HDFS, you need to configure the address of the NameNode of HDFS. So you need to add the following configuration in core-site.xml

<!-- Appoint HDFS in NameNode Address -->
<property>
<name>fs.defaultFS</name>
 <value>hdfs://hadoop101:9000</value>
</property>
<!-- Appoint Hadoop Storage directory where files are generated at run time -->
<property>
 <name>hadoop.tmp.dir</name>
 <value>/opt/module/hadoop-2.7.2/data/tmp</value>
</property>

Note: Hadoop 101 in the address needs to be changed to the IP address or host name of local linux hadoop.tmp.dir can be unconfigured, and HDFS files will be stored in / tmp directory by default

3. Configuration: hdfs-site.xml

Because pseudo distribution has only one node, you need to change the number of HDFS replicas to 1 (the default is 3). You need to add the following configuration in hdfs-site.xm:

<!-- Appoint HDFS Number of copies -->
<property>
 <name>dfs.replication</name>
 <value>1</value>
</property>

2.2 start cluster

1. Format NameNode (format at the first startup, do not always format later)

[root@hadoop101 hadoop-2.7.2]$ bin/hdfs namenode -format

2. Start NameNode

[root@hadoop101 hadoop-2.7.2]# sbin/hadoop-daemon.sh start namenode
starting namenode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-root-namenode-hadoop101.out
[root@hadoop101 hadoop-2.7.2]# 

2. Start DataNode

[root@hadoop101 hadoop-2.7.2]# sbin/hadoop-daemon.sh start datanode
starting datanode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-root-datanode-hadoop101.out
[root@hadoop101 hadoop-2.7.2]# 

2.3 validation cluster

1. Check whether the startup is successful

Execute the command jps, the result is as shown in the figure below, which indicates that both nameNode and dataNode have been started successfully

[root@hadoop101 hadoop-2.7.2]# jps
5584 DataNode
5671 Jps
5496 NameNode
[root@hadoop101 hadoop-2.7.2]# 

2. View the NameNode interface on the webpage

Enter web address in browser: http://hadoop101:50070/, you can log in to the NameNode interface and view the running status and files of the HDFS cluster. Note: the domain name should be changed to your own linux machine domain name or machine IP address. If you can't access it, the firewall may not be off

2.4 operation cluster

After the cluster is started successfully, HDFS can be operated, including creating directory, uploading files, deleting files, and performing MR tasks. Let's test the effect of Mr task on HDFS.

1. Create folder / user/lancer/input on HDFS

[root@hadoop101 hadoop-2.7.2]$ bin/hdfs dfs -mkdir -p /user/lancer/input

Operation result:

2. Upload the test file to the cluster. Here, you can directly use the README.txt file of Hadoop

[atguigu@hadoop101 hadoop-2.7.2]$bin/hdfs dfs -put README.txt /user/atguigu/input/

Operation result:

3. Run MapReduce program

[root@hadoop101 hadoop-2.7.2]# bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount /user/lancer/input/ /user/lancer/output

The operation results are as follows:

You can download the output file in the browser to view the word count results.

3. Start YARN and run MapReduce program

3.1 configuration cluster

1. Configure yarn env.sh and mapred env.sh

In both files, JAVA_HOME is changed to the path of jdk8. In some files, export JAVA_HOME has been commented out and needs to be uncommented

export JAVA_HOME=/opt/module/jdk1.8.0_144

2. Configure yarn-site.xml

Edit yarn-size.xml and add the following configuration items:

<!-- Reducer How to get data -->
<property>
 		<name>yarn.nodemanager.aux-services</name>
 		<value>mapreduce_shuffle</value>
</property>
<!-- Appoint YARN Of ResourceManager Address -->
<property>
        <name>yarn.resourcemanager.hostname</name>
        <value>hadoop101</value>
</property>

3. Configure mapred-site.xml (you need to rename mapred-site.xml.template to mapred-site.xml)

First rename mapred-site.xml.template to mapred-site.xml

[root@hadoop101 hadoop]$ mv mapred-site.xml.template mapred-site.xml

Then add the following configuration items to mapred-site.xml:

<!-- Appoint MR Run in YARN upper -->
<property>
 <name>mapreduce.framework.name</name>
 <value>yarn</value>
</property>

3.2 start yarn

1. Ensure that NameNode and DataNode have been started before starting

2. Start ResourceManager

[root@hadoop101 hadoop-2.7.2]$ sbin/yarn-daemon.sh start resourcemanager

3. Start NodeManager

[root@hadoop101 hadoop-2.7.2]$ sbin/yarn-daemon.sh start nodemanager

3.3 validation cluster

1. Verify startup success

Execute jps. If the result is as follows, the startup is successful.

[root@hadoop101 hadoop-2.7.2]# jps
5584 DataNode
6913 Jps
6071 ResourceManager
5496 NameNode
6318 NodeManager
[root@hadoop101 hadoop-2.7.2]# 

2. View yarn page

Enter web address in browser: http://hadoop101:8088/ , you can log in to the Yarn interface and view the running status of the task. Note: the domain name should be changed to your own linux machine domain name or machine IP address. If you can't access it, the firewall may not be off The page is shown as follows:

3.4 cluster operation

  1. Execute MapReduce program (the method is the same as above)
[root@hadoop101 hadoop-2.7.2]# bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount /user/lancer/input/ /user/lancer/output2

MR program is also executed, but this time it is uniformly deployed by yarn. You can see the task execution progress and task details on the yarn interface, as shown in the following figure:

3.5 configure history server

Although you can see the execution status of the task, now when you click the history button, the web page cannot be opened because the history information viewing service has not been started.

1. Configure yarn-site.xml

Add the following configuration items:

<!-- Historical server address -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop101:10020</value>
</property>
<!-- History server web End address -->
<property>
 <name>mapreduce.jobhistory.webapp.address</name>
 <value>hadoop101:19888</value>
</property>

2. Start the history server

[root@hadoop101 hadoop-2.7.2]$ sbin/mr-jobhistory-daemon.sh start historyserver

3. Check whether the startup is successful

Execute jps. If JobHistoryServer appears, it indicates that the startup is successful

[root@hadoop101 hadoop-2.7.2]# jps
5584 DataNode
7107 JobHistoryServer
6071 ResourceManager
5496 NameNode
7145 Jps
6318 NodeManager
[root@hadoop101 hadoop-2.7.2]# 

4. View MR task details

Posted by spooke2k on Tue, 25 Feb 2020 19:31:27 -0800