1. Edit profile file
export JAVA_HOME=/usr/lib/jvm/jdk/ export HADOOP_INSTALL=/sur/local/hadoop export PATH=$PATH:$HADOOP_INSTALL/bin export PATH=$PATH:$HADOOP_INSTALL/sbin export HADOOP_MAPRED_HOME=$HADOOP_INSTALL export HADOOP_COMMON_HOMR=$HADOOP_INSTALL export HADOOP_HDFS_HOME=$HADOOP_INSTALL export YARN_HOME-$HADOOP_INSTALL
With the above configuration, the system can find the installation path of JDK and Hadoop.
Then enter the directory where Hadoop is located:
cd /usr/local/hadoop/etc/hadoop
2. Edit hadoop-env.sh file
vim hadoop-env.sh
Add the following:
export JAVA_HAME=/usr/lib/jvm/jdk/
3. Configure the core-site.xml file
vim core-site.xml
Add the following:
<configuration> /*The value here refers to the default hdfs path*/ <property> <name>fs.defultFS<name> <value>hdfs://Master:9000</value> </property> /*Buffer size: io.file.buffer.size defaults to 4KB*/ <property> <name>io.file.buffer.size</naem> <value>131072<value> </propertr> /*Temporary folder path*/ <property> <name>hadoop.tmp.dir</name> <value>file:/home//tmp</value> <description>Abase for other temporary directories. </description> </property> <property> <name>hadoop.proxyuser.hduser.hosts</name> <value>*</value> </property> <property> <name>hadoop.poxyuser.hduser.groups</name> <value>*</value> </property> </configuration>
4. Configure yarn-site.xml file
Add the following:
<configuration> <!-- Site specific YARN configuration properties --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <neme>yarn.nodemanager.aux-service.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> /*resourcemanager Address*/ <property> <name>yarn.resourcemanager.address</name> <value>Master:8032</value> </property> /*Port of Scheduler*/ <property> <name>yarn.resourcemanager.scheduler.address</name> <value>Master1:8030</value> </proterty> /*resource-tracker port*/ <property> <name>yarn.resourcesmanager.resource-tracker.address</name> <value>Master:8031</value> </property> /*resourcemanager Manager Port*/ <property> <name>yarn.resourcemanager.admin.address</name> <value>Master:8033</value> </property> /*Resourcemanager web ports to monitor job resource scheduling*/ <property> <name>yarn.resourcemanager.webapp.adress</name> <value>Master:8088</value> </property> </configuration>
5. Configure mapred-site.xml.template file
Add the following:
<configuration> /*hadoop There are three implementations for the map-duce framework. In mapred-site.xml.template, the attribute "mapreduce.framwork.name" is set to "classic", "yarn", "local".*/ <property> <name>mapreduce.framwork.name</name> <value>yarn<value> </property> /*Mapreduce JobHistory.address address*/ <property> <name>mapreduce.jobhistory.address</name> <value>Master:10020</value> </property> /*Mapreduce jobhistory server web ui address*/ <property> <name>mapreduce.jobhistory.webapp.adresss</name> <value>Master:19888</value> </property> </configuration>
6. Create namenode and datanode folder directories and configure their corresponding paths
Note the creation in root mode
mkdir /hdfs/namenode mkdir /hdfs/datanode
7. Return to the directory / usr/local/hadoop/etc/hadoop, configure the hdfs-site.xml file, and add the following to the file:
<configuration> /*Configure the primary node name and port number*/ <property> <name>dfs.namenode.secondary.http-adress</name> <value>Master:9001</value> </property> /*Configure the port number of the slave node name*/ <property> <name>dfs.namenode.name.dir</name> <value>file:/hdfs/namenode</value> </property> /*Configure the data storage directory for datanode*/ <property> <name>dfs.datanode.data.dir</name> <value>file:/hdfs/datanode</value> <property> /*Configuration copy number*/ <property> <name>dfs.replication</name> <value>3<value> </property> /*Set the dfs.webhdfs.enabled property to true, otherwise you can't use the commands of LISTTSTATUS and LISTFILED TATUS of webhdfs to list the status of files and folders, because these information are saved by namenode.*/ <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> </configuration>
8. Configure Master and Slave files
1) Master file is responsible for configuring the host name of the primary node. For example, if the primary node is named Master, you need to add something to the Master file.
Master /*Master host name for master node */
2) The configuration Slave file adds the host name of the slave node so that the master node can find the slave node and communicate with the slave node through the configuration file. For example, with Slave1 to Slave5 as slave host names, you need to add the following information to the Slaves file.
/ Slave * is the slave host name */
Slave1
Slave2
Slave3
Slave4
Slave5
9. Distribute all hadoop files to each node through pssh
Execute the following commands:
./pssh -h hosts.txt -r /hadoop /
10. Format namenode (in the Hadoop root directory)
./bin/hadoop namenode -format
11. Start hadoop
./sbin/start-all.sh