Hadoop Cluster Build-04 Installation Configuration HDFS

HDFS is a distributed file system used with Hadoop, which is divided into two parts.

namenode: nn1.hadoop nn2.hadoop

datanode: s1.hadoop s2.hadoop s3.hadoop

(If you can't understand the five virtual machines, please see the preparation of the first 01.)

Unzip the configuration file

[hadoop@nn1 hadoop_base_op]$ ./ssh_all.sh mv /usr/local/hadoop/etc/hadoop /usr/local/hadoop/etc/hadoop_back

[hadoop@nn1 hadoop_base_op]$ ./scp_all.sh ../up/hadoop.tar.gz /tmp/

[hadoop@nn1 hadoop_base_op]$ #Unzip the custom configuration compression package into / usr/local/hadoop/etc in batches/

#Batch Check Configuration for Correct Decompression
[hadoop@nn1 hadoop_base_op]$ ./ssh_all.sh head /usr/local/hadoop/etc/hadoop/hadoop-env.sh

[hadoop@nn1 hadoop_base_op]$ ./ssh_root.sh chmown -R hadoop:hadoop /usr/local/hadoop/etc/hadoop
[hadoop@nn1 hadoop_base_op]$ ./ssh_root.sh chmod -R 770 /usr/local/hadoop/etc/hadoop

Initialize HDFS

Technological process:

Start zookeeper
Start journalnode
Start the zookeeper client and initialize the zookeeper information of HA
Format namenode on nn1
Start the namenode on nn1
Start synchronous namenode on nn2
Start namenode on nn2
Start ZKFC
Start dataname

1. View zookeeper status

[hadoop@nn1 zk_op]$ ./zk_ssh_all.sh /usr/local/zookeeper/bin/zkServer.sh status
ssh hadoop@"nn1.hadoop" "/usr/local/zookeeper/bin/zkServer.sh status"
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Mode: follower
OK!
ssh hadoop@"nn2.hadoop" "/usr/local/zookeeper/bin/zkServer.sh status"
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Mode: leader
OK!
ssh hadoop@"s1.hadoop" "/usr/local/zookeeper/bin/zkServer.sh status"
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Mode: follower
OK!

See two follower s and one leader indicating that they are working properly. If not, start with the following command

[hadoop@nn1 zk_op]$ ./zk_ssh_all.sh /usr/local/zookeeper/bin/zkServer.sh start

2. Start journalnode

This is the namenode synchronizer.

#Start journalnode on nn1
[hadoop@nn1 zk_op]$ hadoop-daemon.sh start journalnode 

#Start journalnode on nn2
[hadoop@nn1 zk_op]$ hadoop-daemon.sh start journalnode

#You can open the log separately to see the startup status
[hadoop@nn1 zk_op]$ tail /usr/local/hadoop-2.7.3/logs/hadoop-hadoop-journalnode-nn1.hadoop.log
2019-07-22 17:15:54,164 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 8485
2019-07-22 17:15:54,190 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2019-07-22 17:15:54,191 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 8485: starting
#It was found that IPC communication had been established and the journalnode process was in 8485.

3. Initialize HA information (first run only, not later)

[hadoop@nn1 zk_op]$ hdfs zkfc -formatZK
[hadoop@nn1 zk_op]$ /usr/local/zookeeper/bin/zkCli.sh
[zk: localhost:2181(CONNECTED) 0] ls /
[zookeeper, hadoop-ha]
[zk: localhost:2181(CONNECTED) 1] quit
Quitting...

4. Format the namenode on nn1 (only for the first time, not later)

[hadoop@nn1 zk_op]$ hadoop namenode -format
#Successful initialization appears in the following instructions
#19/07/22 17:23:09 INFO common.Storage: Storage directory /data/dfsname has been successfully formatted.

5. Start nn1's namenode

[hadoop@nn1 zk_op]$ hadoop-daemon.sh start namenode
[hadoop@nn1 zk_op]$ tail /usr/local/hadoop/logs/hadoop-hadoop-namenode-nn1.hadoop.log
#
#2019-07-22 17:24:57,321 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
#2019-07-22 17:24:57,322 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 9000: starting
#2019-07-22 17:24:57,385 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: NameNode RPC up at: nn1.hadoop/192.168.10.6:9000
#2019-07-22 17:24:57,385 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services required for standby state
#2019-07-22 17:24:57,388 INFO org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Will roll logs on active node at nn2.hadoop/192.168.10.7:9000 every 120 seconds.
#2019-07-22 17:24:57,394 INFO org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer: Starting standby checkpoint thread...
#Checkpointing active NN at http://nn2.hadoop:50070
#Serving checkpoints at http://nn1.hadoop:50070

6. Synchronize the namenode status of nn1 on the nn2 machine (only the first run, not later)

Let's go to the console of nn2!

###########Be sure to be there nn2 Run this on the machine!!!!############
[hadoop@nn2 ~]$ hadoop namenode -bootstrapStandby
=====================================================
About to bootstrap Standby ID nn2 from:
           Nameservice ID: ns1
        Other Namenode ID: nn1
  Other NN's HTTP address: http://nn1.hadoop:50070
  Other NN's IPC  address: nn1.hadoop/192.168.10.6:9000
             Namespace ID: 1728347664
            Block pool ID: BP-581543280-192.168.10.6-1563787389190
               Cluster ID: CID-42d2124d-9f54-4902-aa31-948fb0233943
           Layout version: -63
       isUpgradeFinalized: true
=====================================================
19/07/22 17:30:24 INFO common.Storage: Storage directory /data/dfsname has been successfully formatted.

7. Start nn2's namenode

Or run in the nn2 console!!

[hadoop@nn2 ~]$ hadoop-daemon.sh start namenode
#Check the log to see if it started successfully.
[hadoop@nn2 ~]$ tail /usr/local/hadoop-2.7.3/logs/hadoop-hadoop-namenode-nn2.hadoop.log

8. Start ZKFC

At this time, ZKFC is started in nn1 and nn2 respectively. At this time, the namenode of the two machines becomes active and standby!! ZKFC realizes HA high availability automatic switching!!

#############stay nn1 Function#################
[hadoop@nn1 zk_op]$ hadoop-daemon.sh start zkfc

#############stay nn2 Function####################
[hadoop@nn2 zk_op]$ hadoop-daemon.sh start zkfc

At this time, access the hadoop interface of the two machines at the browser input address

http://192.168.10.6:50070/dfshealth.html#tab-overview

http://192.168.10.7:50070/dfshealth.html#tab-overview

These two have an active and one is a standby state.

9. Starting dataname is the last three machines to start

########First, determine who is the datanode that needs to be configured in the slaves file
[hadoop@nn1 hadoop]$ cat slaves
s1.hadoop
s2.hadoop
s3.hadoop
###########In the display as active Running on a machine##############
[hadoop@nn1 zk_op]$ hadoop-daemons.sh start datanode

10. View Hard Disk Capacity

Open the hadoop page just now to see if the hard disk of hdfs is well formatted.

Here is the HDFS system, which reserves 2G by default for each physical machine's hard disk (which can be changed in the configuration file hdfs-site.xml), and then actually uses 15G for each machine to make hdfs, so the total number of three machines is 45G.

The HDFS is successfully configured as shown in the figure.

The previous articles are here:

Hadoop cluster build-03 compile and install Hadoop

Hadoop Cluster Build-02 Installation Configuration Zookeeper

Preparations for Hadoop Cluster Building-01

Posted by Chevy on Mon, 22 Jul 2019 03:49:20 -0700

Programmer Group

Hadoop Cluster Build-04 Installation Configuration HDFS

Hot Keywords