preface
- Time: August 13, 2021
- Content:
- Composition of HDFS components
- Composition of YARN components
- MapReduce architecture
- HDFS operation command
- Fully distributed installation
1 Composition of HDFS components
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-w9H7rQlW-1633168412492)(pic/210813 big data lesson 5 / image-20210927113231716.png)]
[hyidol@centos7 hadoop-3.3.1]$ sbin/start-dfs.sh Starting namenodes on [centos7] Starting datanodes Starting secondary namenodes [centos7] [hyidol@centos7 hadoop-3.3.1]$ jps 1442 NameNode 1554 DataNode 1892 Jps 1740 SecondaryNameNode
- Architecture: master-slave architecture, the master manages the slave.
- NameNode and DataNode are one to many relationships.
- 1 NameNode->1000 DataNode
- Process:
- NameNode: stores meta information (file description information). File size, creation time, creator, and last modification time are all file description information.
- SecondaryNameNode: copy and back up NameNode data.
- DataNode: stores data information (files).
- File block:
- dfs.replication = 1 (file block default 128MB)
2 composition of yarn components
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-et0OkIMX-1633168412502)(pic/210813 big data lesson 5 / image-20210927113853522.png)]
- Architecture: master-slave architecture.
- Process:
- Resource Manager: it is the primary node and manages all other node managers
- NodeManagers: manage local resources from the node. Container, task management
- Others (I don't quite understand this)
- Application master: the abstract encapsulation of each request job for MR.
- Container: each task running on YARN in the future will be allocated resources. Container is the abstract encapsulation of the resources required by the current task.
- Container: Docker, which starts a process / thread, that is, the operating system
3 MapReduce architecture
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-KZmYfd3Y-1633168412507)(pic/210813 big data lesson 5 / image-20210927114137116.png)]
- Map stage: divide the data installation requirements to be calculated into multiple MapTask tasks to execute. (min)
- Reduce extremes: copy the results processed in the Map stage and perform summary calculation according to the requirements. (combined)
4 HDFS operation command
-
Syntax: hdfs subcommand - command parameters
-
Operation: $HADOOP_HOME/…
-
bin/hdfs dfs
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-C5DnFI4a-1633168412512)(pic/210813 big data lesson 5 / image-20210927120217488.png)]
-
Upload a file to the input directory under the root directory of HDFS.
bin/hdfs dfs –mkdir /input //Create a directory input in hdfs
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-NU1BmeG5-1633168412516)(pic/210813 big data lesson 5 / image-20210927115403655.png)]
bin/hdfs dfs –put /home/hadoop/user /input/ //user file uploaded to / input of hdfs/
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-pieQM8IC-1633168412519)(pic/210813 big data lesson 5 / image-20210927115503457.png)]
-
View subdirectories and files under subdirectories, and view recursively.
in/hdfs dfs –ls –R /
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (IMG eixwksdq-1633168412521) (PIC / 210813 big data lesson 5 / image-20210927115534028.png)]
-
View the contents of the file.
bin/hdfs dfs –cat /input/user
-
Check how many files or directories are in the root directory of the HDFS file system.
bin/hdfs dfs -ls /
-
YARN based MapReduce calculation:
[the external chain picture transfer fails, and the source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-PQ9KYMP0-1633168412524)(pic/210813 big data lesson 5 / image-20210927115855906.png)]
-
After running, check the HDFS directory structure:
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-B7tMvSfO-1633168412526)(pic/210813 big data lesson 5 / image-20210927115926377.png)]
-
To view the contents of the run results directory:
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-2yMgacsu-1633168412529)(pic/210813 big data lesson 5 / image-20210927115954733.png)]
-
five ⭐ Fully distributed installation
5-1 secret free interworking + JDK
ip | host name | user name | Software |
---|---|---|---|
192.168.137.110 | master-yh | hyidol | JDK |
192.168.137.111 | slave1-yh | hyidol | JDK |
192.168.137.112 | slave2-yh | hyidol | JDK |
-
Virtual machines create virtual machines for three Centos7 systems.
-
Change the / etc / sysconfig / network scripts / ifcfg-enp0s3 file, restart the network, and ping the external network.
BOOTPROTO=static #dchp seems to work ONBOOT=yes #Startup and self start IPADDR=192.168.137.110 #Host address NETMASK=255.255.255.0 #Subnet mask GATEWAY=192.168.137.1 #gateway DNS1=192.168.43.1 #External network address [hyidol@master-yh ~]$ service network restart #Restart the network
-
Realize the direct secret free login between the three machines. (it must be carried out by ordinary users)
[hyidol@master-yh ~]$ ssh-keygen -t rsa [hyidol@master-yh ~]$ ssh-copy-id -i host name [hyidol@master-yh ~]$ ssh host name //For testing
-
Configure the IP address mapping and add the IP address and host name in the / etc/hosts file. (you need to switch to root operation, which is more convenient for remote transmission without assigning permissions one by one)
192.168.137.110 master-yh 192.168.137.111 slave1-yh 192.168.137.112 slave2-yh [root@master-yh ~]$ scp /etc/hosts root@slave1-yh:/etc/
-
Configure JDK for remote transmission.
[hyidol@master-yh ~]$ tar –zxvf jdk-8vcccc.tar.gz [hyidol@master-yh ~]$ mv jdk1.8.0_301 jdk-1.8.0 [hyidol@master-yh ~]$ sudo mv jdk-1.8.0/ /usr/java/ [hyidol@master-yh ~]$ vi .bashrc export JAVA_HOME=/usr/java/jdk-1.8.0 export PATH=$PATH:$JAVA_HOME/bin [hyidol@master-yh ~]$ source .bashrc [hyidol@master-yh ~]$ java [hyidol@master-yh ~]$ javac
-
Turn off the firewall and disable the firewall.
[hyidol@master-yh ~]$ su [hyidol@master-yh ~]$ service firewalld stop [hyidol@master-yh ~]$ systemctl disable firewalld
-
Note: if there are some permission obstacles caused by the previous operation not under root, it is also possible to increase permissions. [the following is the detailed transmission process]
-
---------Below is the master JM host operation---------
-
Create the / usr/java / directory.
-
Grant permission: chmod – R 777 java/
-
Unzip the jdk and move it to the java directory.
-
Configure the environment variable. bashrc.
-
Log in to the other three hosts through the master JM host ssh and perform the following operations:
-
---------Below is the operation of other machines----------
-
Create / usr/java / directory (if it is sudo, you can ignore the permissions assigned later)
-
Grant permission: chmod – R 777 java (take slave1 as an example)
[hyidol@master-yh java]$ ssh slave1-yh [hyidol@slave1-yh ~]$ sudo mkdir /usr/java [hyidol@slave1-yh ~]$ exit
-
---------The following is the remote transmission from the host to other machines------
-
Master JM host. Transfer the jdk-1.8.0 directory to the other three hosts remotely: (take slave1 as an example)
[hyidol@master-yh ~]$ cd /usr/java [hyidol@master-yh java]$ scp jdk-1.8.0/ @slave1-yh:/usr/java/ jdk-1.8.0: not a regular file [hyidol@master-yh java]$ scp -r jdk-1.8.0/ @slave1-yh:/usr/java/ scp: /usr/java//jdk-1.8.0: Permission denied [hyidol@master-yh java]$ sudo scp -r jdk-1.8.0/ @slave2-yh:/usr/java/
-
After the environment variables are configured, they are transferred to the other three hosts:
[hyidol@master-yh ~]$ cd [hyidol@master-yh ~]$ scp .bashrc hyidol@slave1-yh:~/ [hyidol@master-yh ~]$ scp .bashrc hyidol@slave2-yh:~/
-
5-2 installing hadoop
-
Upload, unzip and delete temporarily useless share/doc files.
-
Configure environment variables. Note: source. Bashrc
export HADOOP_HOME=/usr/java/hadoop-3.3.1 export HADOOP_MAPRED_HOME=/usr/java/hadoop-3.3.1 export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
-
Other configurations: location $HADOOP_HOME/etc/hadoop/
hadoop-3.2.0/etc/hadoop/core-site.xml hadoop-3.2.0/etc/hadoop/hdfs-site.xml hadoop-3.2.0/etc/hadoop/mapred-site.xml hadoop-3.2.0/etc/hadoop/yarn-site.xml
reference resources: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html
-
core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://master-yh:9000</value> </property> </configuration>
-
hdfs-site.xml
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>/opt/hdfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/opt/hdfs/data</value> </property> </configuration>
-
mapred-site.xml
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.application.classpath</name> <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value> </property> </configuration>
-
yarn-site.xml
<configuration> <!-- Reducer How to get data --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.env-whitelist</name> <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME,PATH,LANG,TZ,HADOOP_MAPRED_HOME</value> </property> <!-- Specify startup YARN of ResourceManager Host address of --> <property> <name>yarn.resourcemanager.hostname</name> <value>master-yh</value> </property> </configuration>
-
-
The required directories / opt/hdfs/name and / opt/hdfs/data need to be created manually and given permissions. The name directory and data directory are required to configure HDFS.
[hyidol@master-yh hadoop]$ cd /opt/ [hyidol@master-yh opt]$ sudo mkdir hdfs [sudo] hyidol Password for: [hyidol@master-yh opt]$ ll Total consumption 0 drwxr-xr-x. 2 root root 6 9 July 29:43 hdfs [hyidol@master-yh opt]$ su password: [root@master-yh opt]# chmod -R 777 hdfs/ [root@master-yh opt]# exit exit [hyidol@master-yh opt]$ cd hdfs [hyidol@master-yh hdfs]$ mkdir name [hyidol@master-yh hdfs]$ mkdir data [hyidol@master-yh hdfs]$ ll Total consumption 0 drwxrwxr-x. 2 hyidol hyidol 6 9 July 29:45 data drwxrwxr-x. 2 hyidol hyidol 6 9 July 29:45 name
Every host needs operation!! For example, slave1-yh:
[hyidol@master-yh hdfs]$ ssh slave1-yh Last login: Wed Sep 29 04:04:39 2021 from master-yh [hyidol@slave1-yh ~]$ cd /opt/ [hyidol@slave1-yh opt]$ sudo mkdir hdfs [sudo] hyidol Password for: [hyidol@slave1-yh opt]$ su password: [root@slave1-yh opt]# chmod -R 777 hdfs/ [root@slave1-yh opt]# exit exit [hyidol@slave1-yh opt]$ cd hdfs [hyidol@slave1-yh hdfs]$ mkdir name [hyidol@slave1-yh hdfs]$ mkdir data
Summarized below:
cd /opt/ sudo mkdir hdfs (The password is hyidol) su (The password is root) chmod -R 777 hdfs/ exit cd hdfs mkdir name mkdir data
-
Configure worker. Note that the worker sets the DataNode node, so there is no need to add the master node.
[hyidol@master-yh hadoop]$ vi /usr/java/hadoop-3.3.1/etc/hadoop/workers slave1-yh slave2-yh
-
Transfer the hadoop-3.3.1 directory to the other three hosts.
[hyidol@master-yh java]$ scp -r /usr/java/hadoop-3.3.1/ hyidol@slave1-yh:/usr/java/
-
(solved) when the following permission problems occur, it may not be that the permission of the source file is insufficient, but that the location to be copied does not have permission and cannot write to the file:
[root@master-yh java]# scp -r hadoop-3.3.1/ hyidol@slave1-yh:/usr/java/ scp: /usr/java//hadoop-3.3.1: Permission denied [hyidol@master-yh java]$ ssh slave1-yh [hyidol@slave1-yh usr]$ su [root@slave1-yh usr]# chmod -R 777 java [root@slave1-yh usr]# exit
-
-
The environment variable configuration is also passed.
[hyidol@master-yh java]$ cd [hyidol@master-yh ~]$ scp .bashrc hyidol@slave1-yh:~/ [hyidol@master-yh ~]$ scp .bashrc hyidol@slave2-yh:~/
-
Format HDFS.
[hyidol@master-yh ~]$ cd /usr/java/hadoop-3.3.1/ [hyidol@master-yh hadoop-3.3.1]$ bin/hdfs namenode -format
-
Go to this directory to see if there is anything. If there is something, it will succeed.
[hyidol@master-yh /]$ cd /opt/hdfs/name/current/ [hyidol@master-yh current]$ ls fsimage_0000000000000000000 fsimage_0000000000000000000.md5 seen_txid VERSION
-
-
Start all hadoop services. Don't worry about the log warning here. If you restart it, you will find that it has been created by yourself. If it hasn't been created, just give it permission.
[hyidol@master-yh ~]$ start-all.sh WARNING: Attempting to start all Apache Hadoop daemons as hyidol in 10 seconds. WARNING: This is not a recommended production deployment configuration. WARNING: Use CTRL-C to abort. Starting namenodes on [master-yh] Starting datanodes slave1-yh: WARNING: /usr/java/hadoop-3.3.1/logs does not exist. Creating. slave2-yh: WARNING: /usr/java/hadoop-3.3.1/logs does not exist. Creating. Starting secondary namenodes [master-yh] Starting resourcemanager Starting nodemanagers
-
master
[hyidol@master-yh hadoop-3.3.1]$ jps 15249 SecondaryNameNode 15060 NameNode 15492 ResourceManager 15789 Jps
-
slave1
[hyidol@slave1-yh hadoop-3.3.1]$ jps 18497 DataNode 18594 NodeManager 18710 Jps
-
slave2
[hyidol@slave2-yh hadoop-3.3.1]$ jps 3015 DataNode 3112 NodeManager 3228 Jps
-
5-3 error reporting
-
At the beginning, all the nodes in the jps were normal (4 masters and 3 slaves). The 9870 page can also display these nodes, but the 8088 page can't display them. (that active node is always 0.)
-
The little friend said that my cluster is not standard. It's better to press this, and it's more convenient to write a script. But I turned over the records and found that the previous construction method was also successful (confused, learn one first~
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-CRmH6KSk-1633168412531)(pic/210813 big data lesson 5 / image-20210929174559421.png)]
-
Then find a similar blog: hadoop starts datanode with live node of 0: https://www.cnblogs.com/Jomini/p/10705015.html It is said here that after looking at the datanode log, I did something related to the port, and then I did it again to increase the 8031 and 9000 and port permissions. Unexpectedly, I forgot to quit this silly batch of remote connections, and various start all in the slave Of course, the result is... Messy.
-
Later, the partner said, "dd will report whether the dn node is alive or not. The page doesn't show dd active, indicating that there is a problem with the report. Check the dd log. For specific problems, it may be a configuration problem. dd doesn't get up." Although everyone knows this sentence too well, he seems to have found an entry point. I didn't know where to look at the problem before reading the log. This time, I saw the problem as soon as I opened it. (I always reported the same problem, but my subconscious deliberately avoided it and pretended that it was right)
Retrying connect to server: 0.0.0.0/0.0.0.0:8031. Already tried 0 time(s);
-
After Baidu, there were a lot of answers, and the teacher said to me at this time, "there is a difference in the configuration of resourcemanager.hostname." I suddenly realized that I went to look at the site file. Oh, it was not written... Plus it was wrong. I found that I missed a small paragraph in my notes... It was caused by the unfinished writing of yarn-site.xml
-
Here's a note from an elder. It's super detailed. When I read it, I feel that I understand some ambiguous things in the past~
https://blog.csdn.net/qq_41813208/article/details/102693026?utm_source=app&app_version=4.16.0&code=app_1562916241&uLinkId=usr1mkqgl919blen
-
If the node is running incorrectly, you can check it in the log. For example, if an error is reported:
- $HADOOP_HOME/logs/ DataNode is not started: check hadoop-hadoop-datanode-slave1-jm.log
- NodeManager did not start: check hadoop-hadoop-nodemanager-slave1-jm.log
- NameNode is not started: check hadoop-hadoop-namenode-master-jm.log
5-4 success
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-drhe087-163316412533) (PIC / 210813 big data lesson 5 / image-20210929175942151.png)]
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-G3hekoiU-1633168412535)(pic/210813 big data lesson 5 / image-20210929144702593.png)]
[external chain picture transfer fails, and the source station may have anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-R7xmUNfn-1633168412537)(pic/210813 big data lesson 5 / image-20210929173940128.png)]
dn.net/qq_41813208/article/details/102693026?utm_source=app&app_version=4.16.0&code=app_1562916241&uLinkId=usr1mkqgl919blen
-
If the node is running incorrectly, you can check it in the log. For example, if an error is reported:
- $HADOOP_HOME/logs/ DataNode is not started: check hadoop-hadoop-datanode-slave1-jm.log
- NodeManager did not start: check hadoop-hadoop-nodemanager-slave1-jm.log
- NameNode is not started: check hadoop-hadoop-namenode-master-jm.log
5-4 success
[external chain picture transferring... (img-drhee087-16331648412533)]
[external chain pictures are being transferred... (img-G3hekoiU-1633168412535)]
[external chain picture transferring... (img-R7xmUNfn-1633168412537)]