Hadoop's Trample Notes (1)
Deploying a hadoop cluster on Centos7
Environmental Science
Machine 1(hadoop1-ali) Ali Cloud (CentOS 7.3) 120.26.173.104
Machine 2(hadoop2-hw) Huawei Cloud (CentOS 7.4) 114.116.233.156
Where the first server serves as a namenode and the second serves as a datanode
Modify hostname and hostfile
Execute on two machines
hostname hadoop1-ali
hostname hadoop2-hw
Modify the / etc/host s (host s with s) file for both machines to add the following
120.26.173.104 hadoop1-ali 114.116.233.156 hadoop2-hw
After modification, you can check whether the host is valid separately
ping hadoop1-ali ping hadoop2-hw
Generate key files for two machines
Run commands on two machines to generate ssh keys
ssh-keygen -t rsa -P ''
Create a new one named authorized_Files for keys
Will be/root/.ssh/id_per machineRsa.pubThe contents of the file are copied to the above file, one line at a time
Then authorized_The keys file is uploaded to the / root/.ssh / directory of each machine
After success, the test uses ssh for passwordless login, such as on hadoop1-ali to test login to another machine
ssh hadoop2-hw
After entering a yes, you can see the welcome from another server, and the two machines are logged in to each other.
It is important to note that all the above operations are performed by root users, which eliminates tedious operations such as privilege configuration, but can cause potential security problems. It is not recommended to configure using root users directly in production environments.
Install openJDK1.8
Huawei Cloud already has openJDK1.8 installed by default, so it only needs to be done on the Ali Cloud server
yum install java-1.8.0-openjdk -y yum install java-1.8.0-openjdk-devel -y
Then configure the environment variables, edit the file/etc/profile and add the
#Java export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.252.b09-2.el7_8.x86_64 export CALSSPATH=$JAVA_HOME/lib/*.* export PATH=$PATH:$JAVA_HOME/bin
Save to take effect
source /etc/profile
Install hadoop
fromHttps://www.apache.org/dyn/close.cgi/hadoop/common/hadoop-2.8.5/hadoop-2.8.5.tar.gzDownload the zipped package and upload it to the server's/opt/hadoop/directory
Unzip
tar -xvf hadoop-2.8.5.tar.gz
Create several new directories in the / root directory
mkdir /root/hadoop mkdir /root/hadoop/tmp mkdir /root/hadoop/var mkdir /root/hadoop/dfs mkdir /root/hadoop/dfs/name mkdir /root/hadoop/dfs/data
Modify a series of configuration files in etc/hadoop
Enter directory/opt/hadoop/hadoop-2.8.5/etc/hadoop
Modify core-Site.xmlJoin Configuration in <configuration>Node
<configuration> <property> <name>hadoop.tmp.dir</name> <value>/root/hadoop/tmp</value> <description>Abase for other temporary directories.</description> </property> <property> <name>fs.default.name</name> <value>hdfs://hadoop1-ali:9000</value> </property> </configuration>
Modify hadoop-Env.shWill ${JAVA_HOME} Modify to its own jdk path
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.252.b09-2.el7_8.x86_64
Modify hdfs-Site.xmlJoin Configuration
<property> <name>dfs.name.dir</name> <value>/root/hadoop/dfs/name</value> <description>Path on the local filesystem where theNameNode stores the namespace and transactions logs persistently.</description> </property> <property> <name>dfs.data.dir</name> <value>/root/hadoop/dfs/data</value> <description>Comma separated list of paths on the localfilesystem of a DataNode where it should store its blocks.</description> </property> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.permissions</name> <value>false</value> <description>need not permissions</description> </property>
New and modify mapred-site.xml
Copy the template file in the directory and rename it
cp mapred-site.xml.template mapred-site.xml
Join Configuration
<property> <name>mapred.job.tracker</name> <value>hadoop1-ali:49001</value> </property> <property> <name>mapred.local.dir</name> <value>/root/hadoop/var</value> </property> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property>
Modify the slaves file, delete the localhost from it, and add your own datanode
hadoop2-hw
Modify yarn-Site.xmlFile, add configuration
<property> <name>yarn.resourcemanager.hostname</name> <value>hadoop1-ali</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property>
Repeat this on another machine, and the files can be copied directly, involving six files, which are identical on both servers
core-site.xml, mapred-site.xml, yarn-site.xml, slaves, hadoop-env.sh,hdfs-site.xml
The jdk path involved needs to be modified to java-1.8.0-openjdk-1.8.0.232.b09-0.el7_7.aarch64
Start hadoop
Perform initialization on namenode(hadoop1-ali)
cd /opt/hadoop/hadoop-2.8.5/bin ./hadoop namenode -format
Execute startup command on namenode(hadoop1-ali)
cd /opt/hadoop/hadoop-2.8.5/sbin ./start-all.sh
You can succeed by typing yes twice
Modify Security Group Open Port Background in Aliyun and Huawei Cloud respectively
Access in browserHttp://120.26.173.104: 50070/medium to see the effect
Supplementary Instructions
The hostname, ip, jdk etc. involved in this article vary from person to device
Subsequently tested, the ip of the node itself in the hosts file should be filled in the intranet ip
Run wordcount example
Configure the hadoop environment variable first, which is referenced directly later
Create input directory in hdfs file system
hdfs dfs -mkdir /input
Create a new example file and write any statements
Copy local example files to the / input directory of the hdfs file system
hdfs dfs -copyFromLocal example /input
Run in the hadoop-2.8.5 directory
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.5.jar wordcount /input /output
WordCount is an example program for mapreduce that counts the number of words in all files in the / input directory and stores the results in the / output directory.
After running, view word statistics
hdfs dfs -cat /output/*
Reference resources
Process Main Reference
https://blog.csdn.net/pucao_c...
Local IP in hosts file should be filled in with intranet ip, otherwise it will cause namenode to fail to start. Refer to
https://blog.csdn.net/dongdon...
hadoop environment variable configuration
https://blog.csdn.net/fantasy...
The original text is from Chen 11's Blog