Hadoop HA double namenode construction

Keywords: Big Data Hadoop Zookeeper ssh xml

Machine distribution

hadoop1 192.168.56121

hadoop2 192.168.56122

hadoop3 192.168.56123

Preparing the installation package

jdk-7u71-linux-x64.tar.gz

zookeeper-3.4.9.tar.gz

hadoop-2.9.2.tar.gz

Upload the installation package to the / usr/local directory of three machines and extract it

Configure hosts

echo "192.168.56.121 hadoop1" >> /etc/hosts
echo "192.168.56.122 hadoop2" >> /etc/hosts
echo "192.168.56.123 hadoop3" >> /etc/hosts

Configure environment variables

/etc/profile

export HADOOP_PREFIX=/usr/local/hadoop-2.9.2
export JAVA_HOME=/usr/local/jdk1.7.0_71

Deploy zookeeper

Create a zoo user

useradd zoo
passwd zoo

Change the owner of zookeeper directory to zoo

chown zoo:zoo -R /usr/local/zookeeper-3.4.9

Modify the zookeeper configuration file

Go to the / usr/local/zookeeper-3.4.9/conf directory

cp zoo_sample.cfg zoo.cfg
vi zoo.cfg
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/usr/local/zookeeper-3.4.9
clientPort=2181
server.1=hadoop1:2888:3888
server.2=hadoop2:2888:3888
server.3=hadoop3:2888:3888

Create the myid file and put it in the directory / usr/local/zookeeper-3.4.9. Only 1-255 numbers are saved in the myid file, which is the same as the ID in the server.id line of zoo.cfg.

myid is 1 in Hadoop 1

myid is 2 in Hadoop 2

myid is 3 in Hadoop 3

Start the zookeeper service on three machines

[zoo@hadoop1 zookeeper-3.4.9]$ bin/zkServer.sh start

Verify zookeeper

[zoo@hadoop1 zookeeper-3.4.9]$ bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper-3.4.9/bin/../conf/zoo.cfg
Mode: follower

Configure Hadoop

Create user

useradd hadoop
passwd hadoop

Modify hadoop directory owner to hadoop

chmod hadoop:hadoop -R /usr/local/hadoop-2.9.2

Create directory

mkdir /hadoop1 /hadoop2 /hadoop3
chown hadoop:hadoop /hadoop1
chown hadoop:hadoop /hadoop2
chown hadoop:hadoop /hadoop3

Configure mutual trust

ssh-keygen
ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@hadoop1
ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@hadoop2
ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@hadoop3
#Use the following command to test mutual trust
ssh hadoop1 date
ssh hadoop2 date
ssh hadoop3 date

Configure environment variables

/home/hadoop/.bash_profile

export PATH=$JAVA_HOME/bin:$HADOOP_PREFIX/bin:$HADOOP_PREFIX/sbin:$PATH

configuration parameter

etc/hadoop/hadoop-env.sh 

export JAVA_HOME=/usr/local/jdk1.7.0_71

etc/hadoop/core-site.xml

<!-- Appoint hdfs Of nameservice by ns -->
 <property>
      <name>fs.defaultFS</name>
      <value>hdfs://ns</value>
 </property>
 <!--Appoint hadoop Temporary data storage directory-->
 <property>
      <name>hadoop.tmp.dir</name>
      <value>/usr/loca/hadoop-2.9.2/temp</value>
 </property>
 <property>
      <name>io.file.buffer.size</name>
      <value>4096</value>
 </property>
 <!--Appoint zookeeper address-->
 <property>
      <name>ha.zookeeper.quorum</name>
      <value>hadoop1:2181,hadoop2:2181,hadoop3:2181</value>
 </property>

 

etc/hadoop/hdfs-site.xml

<!--Appoint hdfs Of nameservice by ns,Need and core-site.xml Consistent in -->
  <property>
      <name>dfs.nameservices</name>
      <value>ns</value>
  </property>
  <!-- ns There are two below NameNode,Namely nn1,nn2 -->
  <property>
     <name>dfs.ha.namenodes.ns</name>
     <value>nn1,nn2</value>
  </property>
  <!-- nn1 Of RPC Mailing address -->
  <property>
     <name>dfs.namenode.rpc-address.ns.nn1</name>
     <value>hadoop1:9000</value>
  </property>
  <!-- nn1 Of http Mailing address -->
  <property>
      <name>dfs.namenode.http-address.ns.nn1</name>
      <value>hadoop1:50070</value>
  </property>
  <!-- nn2 Of RPC Mailing address -->
  <property>
      <name>dfs.namenode.rpc-address.ns.nn2</name>
      <value>hadoop2:9000</value>
  </property>
  <!-- nn2 Of http Mailing address -->
  <property>
      <name>dfs.namenode.http-address.ns.nn2</name>
      <value>hadoop2:50070</value>
  </property>
  <!-- Appoint NameNode The metadata of JournalNode Storage location on -->
  <property>
       <name>dfs.namenode.shared.edits.dir</name>
       <value>qjournal://hadoop1:8485;hadoop2:8485;hadoop3:8485/ns</value>
  </property>
  <!-- Appoint JournalNode Where to store data on the local disk -->
  <property>
        <name>dfs.journalnode.edits.dir</name>
        <value>/hadoop1/hdfs/journal</value>
  </property>
  <!-- open NameNode Automatic switching in case of failure -->
  <property>
        <name>dfs.ha.automatic-failover.enabled</name>
        <value>true</value>
  </property>
  <!-- Configuration failure auto switch implementation mode -->
  <property>
          <name>dfs.client.failover.proxy.provider.ns</name>
          <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
  </property>
  <!-- Configure the isolation mechanism if ssh Is the default 22 port, value direct writing sshfence that will do -->
  <property>
           <name>dfs.ha.fencing.methods</name>
           <value>sshfence</value>
  </property>
  <!-- When using isolation mechanism ssh No landfall -->
  <property>
          <name>dfs.ha.fencing.ssh.private-key-files</name>
          <value>/home/hadoop/.ssh/id_rsa</value>
  </property>
  <property>
      <name>dfs.namenode.name.dir</name>
      <value>file:/hadoop1/hdfs/name,file:/hadoop2/hdfs/name</value>
  </property>
  <property>
      <name>dfs.datanode.data.dir</name>
      <value>file:/hadoop1/hdfs/data,file:/hadoop2/hdfs/data,file:/hadoop3/hdfs/data</value>
  </property>
  <property>
     <name>dfs.replication</name>
     <value>2</value>
  </property>
  <!-- stay NN and DN Upper opening WebHDFS (REST API)function,Not necessary -->
  <property>
     <name>dfs.webhdfs.enabled</name>
     <value>true</value>
  </property>
  <property>
  <!-- List of permitted/excluded DataNodes.  -->
<name>dfs.hosts.exclude</name>
<value>/usr/local/hadoop-2.9.2/etc/hadoop/excludes</value>
</property>


etc/hadoop/mapred-site.xml

<property>
      <name>mapreduce.framework.name</name>
      <value>yarn</value>
  </property>
yarn-site.xml
  <!-- Appoint nodemanager Load on startup server The way is shuffle server -->
  <property>
          <name>yarn.nodemanager.aux-services</name>
          <value>mapreduce_shuffle</value>
   </property>
   <property>
          <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
          <value>org.apache.hadoop.mapred.ShuffleHandler</value>
   </property>
   <!-- Appoint resourcemanager address -->
   <property>
          <name>yarn.resourcemanager.hostname</name>
          <value>hadoop1</value>
    </property>

etc/hadoop/slaves

hadoop1
hadoop2
hadoop3

First start command

1,Start the Zookeeper,Execute the following command on each node:
bin/zkServer.sh start
2,In a certain namenode The node executes the following command to create a namespace
hdfs zkfc -formatZK
3,In each journalnode The node is started with the following command journalnode
sbin/hadoop-daemon.sh start journalnode
4,In the main namenode Node format namenode and journalnode Catalog
hdfs namenode -format ns
5,In the main namenode Node start up namenode process
sbin/hadoop-daemon.sh start namenode
6,In preparation namenode The node executes the first line of command. This is the backup namenode The directory of the node is formatted and metadata is transferred from the primary namenode node copy Come here, and this order won't journalnode The directory is formatted again! Then start the standby with the second command namenode Process!
hdfs namenode -bootstrapStandby
sbin/hadoop-daemon.sh start namenode
7,In two namenode Nodes execute the following commands
sbin/hadoop-daemon.sh start zkfc
8,In all datanode All nodes execute the following command to start datanode
sbin/hadoop-daemon.sh start datanode

Daily start stop command

#Start script, start all node services
sbin/start-dfs.sh
#Stop script, stop all node services
sbin/stop-dfs.sh Verification

jps check process


http://192.168.56.122:50070

http://192.168.56.121:50070

Upload and download test files

#Create directory
[hadoop@hadoop1 ~]$ hadoop fs -mkdir /test
#Verification
[hadoop@hadoop1 ~]$ hadoop fs -ls /
Found 1 items
drwxr-xr-x   - hadoop supergroup          0 2019-04-12 12:16 /test    
#Upload files
[hadoop@hadoop1 ~]$ hadoop fs -put /usr/local/hadoop-2.9.2/LICENSE.txt /test
#Verification
[hadoop@hadoop1 ~]$ hadoop fs -ls /test                                     
Found 1 items
-rw-r--r--   2 hadoop supergroup     106210 2019-04-12 12:17 /test/LICENSE.txt
#Download files to / tmp
[hadoop@hadoop1 ~]$ hadoop fs -get /test/LICENSE.txt /tmp
#Verification
[hadoop@hadoop1 ~]$ ls -l /tmp/LICENSE.txt 
-rw-r--r--. 1 hadoop hadoop 106210 Apr 12 12:19 /tmp/LICENSE.txt


Reference resources: https://blog.csdn.net/Trigl/article/details/55101826

Posted by bungychicago on Fri, 29 Nov 2019 09:16:17 -0800