Deploying a hadoop cluster on Centos7

Keywords: Java Hadoop xml ssh

Hadoop's Trample Notes (1)

Deploying a hadoop cluster on Centos7

Environmental Science

Machine 1(hadoop1-ali) Ali Cloud (CentOS 7.3) 120.26.173.104

Machine 2(hadoop2-hw) Huawei Cloud (CentOS 7.4) 114.116.233.156

Where the first server serves as a namenode and the second serves as a datanode

Modify hostname and hostfile

Execute on two machines

hostname hadoop1-ali
hostname hadoop2-hw

Modify the / etc/host s (host s with s) file for both machines to add the following

120.26.173.104 hadoop1-ali
114.116.233.156 hadoop2-hw

After modification, you can check whether the host is valid separately

ping hadoop1-ali
ping hadoop2-hw

Generate key files for two machines

Run commands on two machines to generate ssh keys

ssh-keygen -t rsa -P ''

Create a new one named authorized_Files for keys

Will be/root/.ssh/id_per machineRsa.pubThe contents of the file are copied to the above file, one line at a time

Then authorized_The keys file is uploaded to the / root/.ssh / directory of each machine

After success, the test uses ssh for passwordless login, such as on hadoop1-ali to test login to another machine

ssh hadoop2-hw

After entering a yes, you can see the welcome from another server, and the two machines are logged in to each other.

It is important to note that all the above operations are performed by root users, which eliminates tedious operations such as privilege configuration, but can cause potential security problems. It is not recommended to configure using root users directly in production environments.

Install openJDK1.8

Huawei Cloud already has openJDK1.8 installed by default, so it only needs to be done on the Ali Cloud server

yum install java-1.8.0-openjdk  -y
yum install java-1.8.0-openjdk-devel  -y

Then configure the environment variables, edit the file/etc/profile and add the

#Java
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.252.b09-2.el7_8.x86_64
export CALSSPATH=$JAVA_HOME/lib/*.*
export PATH=$PATH:$JAVA_HOME/bin

Save to take effect

source /etc/profile

Install hadoop

fromHttps://www.apache.org/dyn/close.cgi/hadoop/common/hadoop-2.8.5/hadoop-2.8.5.tar.gzDownload the zipped package and upload it to the server's/opt/hadoop/directory

Unzip

tar -xvf hadoop-2.8.5.tar.gz

Create several new directories in the / root directory

mkdir  /root/hadoop
mkdir  /root/hadoop/tmp
mkdir  /root/hadoop/var
mkdir  /root/hadoop/dfs
mkdir  /root/hadoop/dfs/name
mkdir  /root/hadoop/dfs/data

Modify a series of configuration files in etc/hadoop

Enter directory/opt/hadoop/hadoop-2.8.5/etc/hadoop

Modify core-Site.xmlJoin Configuration in <configuration>Node

<configuration>
   <property>

        <name>hadoop.tmp.dir</name>

        <value>/root/hadoop/tmp</value>

        <description>Abase for other temporary directories.</description>

   </property>

   <property>

        <name>fs.default.name</name>

        <value>hdfs://hadoop1-ali:9000</value>

   </property>
</configuration>

Modify hadoop-Env.shWill ${JAVA_HOME} Modify to its own jdk path

export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.252.b09-2.el7_8.x86_64

Modify hdfs-Site.xmlJoin Configuration

<property>

   <name>dfs.name.dir</name>

   <value>/root/hadoop/dfs/name</value>

   <description>Path on the local filesystem where theNameNode stores the namespace and transactions logs persistently.</description>

</property>

<property>

   <name>dfs.data.dir</name>

   <value>/root/hadoop/dfs/data</value>

   <description>Comma separated list of paths on the localfilesystem of a DataNode where it should store its blocks.</description>

</property>

<property>

   <name>dfs.replication</name>

   <value>2</value>

</property>

<property>

      <name>dfs.permissions</name>

      <value>false</value>

      <description>need not permissions</description>

</property>

New and modify mapred-site.xml

Copy the template file in the directory and rename it

cp mapred-site.xml.template mapred-site.xml

Join Configuration

<property>

   <name>mapred.job.tracker</name>

   <value>hadoop1-ali:49001</value>

</property>

<property>

      <name>mapred.local.dir</name>

       <value>/root/hadoop/var</value>

</property>


<property>

       <name>mapreduce.framework.name</name>

       <value>yarn</value>

</property>

Modify the slaves file, delete the localhost from it, and add your own datanode

hadoop2-hw

Modify yarn-Site.xmlFile, add configuration

<property>

        <name>yarn.resourcemanager.hostname</name>

        <value>hadoop1-ali</value>

   </property>

   <property>

        <name>yarn.nodemanager.aux-services</name>

        <value>mapreduce_shuffle</value>

   </property>

Repeat this on another machine, and the files can be copied directly, involving six files, which are identical on both servers

core-site.xml, mapred-site.xml, yarn-site.xml, slaves, hadoop-env.sh,hdfs-site.xml

The jdk path involved needs to be modified to java-1.8.0-openjdk-1.8.0.232.b09-0.el7_7.aarch64

Start hadoop

Perform initialization on namenode(hadoop1-ali)

cd /opt/hadoop/hadoop-2.8.5/bin
./hadoop namenode -format

Execute startup command on namenode(hadoop1-ali)

cd /opt/hadoop/hadoop-2.8.5/sbin
./start-all.sh

You can succeed by typing yes twice

Modify Security Group Open Port Background in Aliyun and Huawei Cloud respectively

Access in browserHttp://120.26.173.104: 50070/medium to see the effect

Supplementary Instructions

The hostname, ip, jdk etc. involved in this article vary from person to device

Subsequently tested, the ip of the node itself in the hosts file should be filled in the intranet ip

Run wordcount example

Configure the hadoop environment variable first, which is referenced directly later

Create input directory in hdfs file system

hdfs dfs -mkdir /input

Create a new example file and write any statements

Copy local example files to the / input directory of the hdfs file system

hdfs dfs -copyFromLocal example /input

Run in the hadoop-2.8.5 directory

hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.5.jar wordcount /input /output

WordCount is an example program for mapreduce that counts the number of words in all files in the / input directory and stores the results in the / output directory.

After running, view word statistics

hdfs dfs -cat /output/*

Reference resources

Process Main Reference

https://blog.csdn.net/pucao_c...

Local IP in hosts file should be filled in with intranet ip, otherwise it will cause namenode to fail to start. Refer to

https://blog.csdn.net/dongdon...

hadoop environment variable configuration

https://blog.csdn.net/fantasy...

The original text is from Chen 11's Blog

Posted by maxxx on Sun, 28 Jun 2020 17:21:11 -0700