Configuring the Hadoop environment

Keywords: Big Data Hadoop xml JDK jvm

1. Edit profile file

export JAVA_HOME=/usr/lib/jvm/jdk/
export HADOOP_INSTALL=/sur/local/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOMR=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME-$HADOOP_INSTALL

With the above configuration, the system can find the installation path of JDK and Hadoop.

 

Then enter the directory where Hadoop is located:

cd /usr/local/hadoop/etc/hadoop

2. Edit hadoop-env.sh file

vim hadoop-env.sh

Add the following:

export JAVA_HAME=/usr/lib/jvm/jdk/

3. Configure the core-site.xml file

vim core-site.xml

Add the following:

<configuration>
/*The value here refers to the default hdfs path*/
<property>
<name>fs.defultFS<name>
<value>hdfs://Master:9000</value>
</property>
/*Buffer size: io.file.buffer.size defaults to 4KB*/
<property>
<name>io.file.buffer.size</naem>
<value>131072<value>
</propertr>
/*Temporary folder path*/
<property>
<name>hadoop.tmp.dir</name>
<value>file:/home//tmp</value>
<description>Abase for other temporary directories. </description>
</property>
<property>
<name>hadoop.proxyuser.hduser.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.poxyuser.hduser.groups</name>
<value>*</value>
</property>
</configuration>

4. Configure yarn-site.xml file

Add the following:

<configuration>

<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<neme>yarn.nodemanager.aux-service.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
/*resourcemanager Address*/
<property>
<name>yarn.resourcemanager.address</name>
<value>Master:8032</value>
</property>
/*Port of Scheduler*/
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>Master1:8030</value>
</proterty>
/*resource-tracker port*/
<property>
<name>yarn.resourcesmanager.resource-tracker.address</name>
<value>Master:8031</value>
</property>
/*resourcemanager Manager Port*/
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>Master:8033</value>
</property>
/*Resourcemanager web ports to monitor job resource scheduling*/
<property>
<name>yarn.resourcemanager.webapp.adress</name>
<value>Master:8088</value>
</property>
</configuration>

5. Configure mapred-site.xml.template file

Add the following:

<configuration>
/*hadoop There are three implementations for the map-duce framework. In mapred-site.xml.template, the attribute "mapreduce.framwork.name" is set to "classic", "yarn", "local".*/
<property>
<name>mapreduce.framwork.name</name>
<value>yarn<value>
</property>
/*Mapreduce JobHistory.address address*/
<property>
<name>mapreduce.jobhistory.address</name>
<value>Master:10020</value>
</property>
/*Mapreduce jobhistory server web ui address*/
<property>
<name>mapreduce.jobhistory.webapp.adresss</name>
<value>Master:19888</value>
</property>
</configuration>

6. Create namenode and datanode folder directories and configure their corresponding paths

Note the creation in root mode

mkdir /hdfs/namenode
mkdir /hdfs/datanode

7. Return to the directory / usr/local/hadoop/etc/hadoop, configure the hdfs-site.xml file, and add the following to the file:

<configuration>
/*Configure the primary node name and port number*/
<property>
<name>dfs.namenode.secondary.http-adress</name>
<value>Master:9001</value>
</property>
/*Configure the port number of the slave node name*/
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/hdfs/namenode</value>
</property>
/*Configure the data storage directory for datanode*/
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/hdfs/datanode</value>
<property>
/*Configuration copy number*/
<property>
<name>dfs.replication</name>
<value>3<value>
</property>
/*Set the dfs.webhdfs.enabled property to true, otherwise you can't use the commands of LISTTSTATUS and LISTFILED TATUS of webhdfs to list the status of files and folders, because these information are saved by namenode.*/
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>

8. Configure Master and Slave files

1) Master file is responsible for configuring the host name of the primary node. For example, if the primary node is named Master, you need to add something to the Master file.

Master /*Master host name for master node */

2) The configuration Slave file adds the host name of the slave node so that the master node can find the slave node and communicate with the slave node through the configuration file. For example, with Slave1 to Slave5 as slave host names, you need to add the following information to the Slaves file.

/ Slave * is the slave host name */

Slave1

Slave2

Slave3

Slave4

Slave5

9. Distribute all hadoop files to each node through pssh

Execute the following commands:

./pssh -h hosts.txt -r /hadoop /

10. Format namenode (in the Hadoop root directory)

./bin/hadoop namenode -format

11. Start hadoop

./sbin/start-all.sh

Posted by activomate on Tue, 29 Jan 2019 20:42:14 -0800