Build hadoop HA high availability

Keywords: Hadoop Zookeeper vim xml

Step 1: Cluster Planning

Step 2: set hosts

Step3: turn off the firewall

Step 4: turn off Selinux

Step5: Keyless login

Step 6: install jdk


tar -xvf jdk-8u131-linux-x64.tar.gz

mv jdk1.8.0_131 /usr/local/jdk1.8

#Setting environment variables
vim /etc/profile


Step 7: install zookeeper

[root@sl-opencron src]# wget

[root@sl-opencron src]# tar -xvf zookeeper-3.4.10.tar.gz

#The extracted directory mv to / usr/local /
[root@sl-opencron src]# mv zookeeper-3.4.10 /usr/local/zookeeper

Step 7.1: configure zookeeper

cd /usr/local/zookeeper/conf/

#Name zoo ABCD sample.cfg (template configuration file) zoo.cfg
mv zoo_sample.cfg  zoo.cfg

#Modify profile
[root@sl-opencron conf]# vim zoo.cfg 


#Path customizable


Step 7.2: generate myid file

mkdir /data/zookeeper

cd /data/zookeeper

touch myid

echo "1" >> myid

//Note: cdh-1 myid is 1 cdh-2 myid is 2 cdh-3 myid is 3

Step 7.3: start the zookeeper cluster

Note: in CDH-1, cdh-2, cdh-3 respectively

cd /usr/local/zookeeper/bin

./ start

Step 8: install hadoop


tar -xvf hadoop-2.7.6.tar.gz

#The extracted directory is moved to / usr/local/
 mv hadoop-2.7.6 /usr/local/hadoop

#Enter hadoop directory
 cd /usr/local/hadoop

#Create several directories
[root@hadooop-master hadoop]# mkdir tmp dfs dfs/data dfs/name

Step 8.1: configure hadoop

vim core-site.xml

   <!-- Appoint hdfs Of nameservice by ns1 -->

<!-- Appoint hadoop Temporary folder -->

<!--Appoint zookeeper address-->


vim hdfs-site.xml 

<!--Appoint hdfs Of nameservice by ns1,Need and core-site.xml Consistent in -->
	<!-- ns1 Here are two NameNode,Each is nn1,nn2 -->
	<!-- nn1 Of RPC Mailing address -->
	<!-- nn1 Of http Mailing address -->
	<!-- nn2 Of RPC Mailing address -->
	<!-- nn2 Of http Mailing address -->
	<!-- Appoint NameNode The metadata of JournalNode Storage location on -->
	<!-- Appoint JournalNode Where to store data on the local disk -->
	<!-- open NameNode Fail to switch actively -->
	<!-- Fail to configure, switch the implementation mode actively -->
	<!-- Configure the isolation mechanism method. Multiple mechanisms are cut by line breaking, i.e. each mechanism temporarily uses one line-->
	<!-- Use sshfence Isolation mechanism ssh No landfall -->
	<!-- To configure sshfence Isolation mechanism timeout -->



<!-- Site specific YARN configuration properties -->
<!-- open RM High availability -->
	<!-- Appoint RM Of cluster id -->
	<!-- Appoint RM Name -->
	<!-- Assign separately RM Address -->
	<!-- Appoint zookeeper Cluster address -->


vim mapred-site.xml

<!-- Appoint mr The framework is yarn mode -->


vim slaves 


Step 8.2: synchronize to each server

rsync -av /usr/local/hadoop/etc/ cdh-1:/usr/local/hadoop/etc/

rsync -av /usr/local/hadoop/etc/ cdh-2:/usr/local/hadoop/etc/

rsync -av /usr/local/hadoop/etc/ cdh-3:/usr/local/hadoop/etc/

rsync -av /usr/local/hadoop/etc/ cdh-4:/usr/local/hadoop/etc/

rsync -av /usr/local/hadoop/etc/ cdh-5:/usr/local/hadoop/etc/

Step 9: start the journal node

Note: it is started in cdh-1, cdh-2 and cdh-3 respectively

cd /usr/local/hadoop/sbin/

./ start journalnode

Step10: format HDFS

Note: operation in cdh-4

cd /usr/local/hadoop/bin/

./hdfs namenode -format

#Note: after formatting, you need to copy the tmp folder to cdh-5 (otherwise, the namenode of cdh-5 will not work)

 cd /usr/local/hadoop/

 scp -r tmp/ cdh-5:/usr/local/hadoop/
VERSION                                                                                             100%  207   222.7KB/s   00:00    
fsimage_0000000000000000000.md5                                                                     100%   62    11.3KB/s   00:00    
fsimage_0000000000000000000                                                                         100%  321   327.3KB/s   00:00    
seen_txid                                                                                           100%    2     1.4KB/s   00:00    

Step11: format ZKFC

Note: operation in cdh-4

cd /usr/local/hadoop/bin/

./hdfs zkfc -formatZK


Step 12: start yarn

Note: operation in cdh-4

cd /usr/local/hadoop/sbin/



Step13: resource manager of cdh-5 needs to be started manually and separately

cd /usr/local/hadoop/sbin/

./ start resourcemanager

Step 14: view the cluster process

[root@cdh-1 ~]# jps
26754 QuorumPeerMain
22387 JournalNode
5286 Jps
4824 NodeManager
25752 DataNode

[root@cdh-2 ~]# jps
4640 JournalNode
29520 QuorumPeerMain
5799 Jps
4839 DataNode
5642 NodeManager

[root@cdh-3 ~]# jps
28738 JournalNode
28898 DataNode
29363 NodeManager
20836 QuorumPeerMain
29515 Jps

[root@cdh-4 ~]# jps
21491 Jps
21334 NameNode
20167 DFSZKFailoverController
21033 ResourceManager

[root@cdh-5 ~]# jps
20403 ResourceManager
20280 NameNode
20523 Jps
19693 DFSZKFailoverController

Step 15: testing highly available clusters

Note: as shown in the figure, the status of cdh-5 is active, and the status of cdh-4 is standby

Step 15.1: stop cdh-5 namenode

[root@cdh-5 hadoop]# cd /usr/local/hadoop/sbin/

[root@cdh-5 sbin]# ./ stop namenode
stopping namenode

[root@cdh-5 sbin]# ./ start namenode

Note: after the cdh-5 namenode is stopped and the cdh-4 is refreshed, we can see that the current status of cdh-4 is active and the status of cdh-5 is standby



Posted by scriptkiddie on Fri, 03 Jan 2020 09:30:56 -0800