Article directory
- 1. Basic information
- 2. Installation process
- 1. Switch to hadoop account and decompress hadoop to the destination installation directory by tar-zxvf command:
- 2. Create tmpdir directory:
- Configure hadoop-env.sh file:
- 4. Configure mapred-env.sh file:
- 5. Configure the core-site.xml file core-site.xml
- Configure hdfs-site.xml file hdfs-site.xml
- Configure mapred-site.xml file mapred-site.xml
- Configure yarn-site.xml file: yarn-site.xml
- 9. Configure the environment variables that hadoop runs
- 10. Modify the slaves file:
- 11. Copy hadoop-2.7.3 to hadoop@test2 and hadoop@test2 machines on test and modify the environment variables in step 9 and perform the following actions:
- 12. Format namenode (only the first startup needs to be formatted!) Start hadoop and start the job history service:
- 13. Check the service of each machine, and input jps on test, test2 and test3 machines respectively:
- Q&A
- Core elements of hadoop
1. Basic information
- Version 2.7.3
- Install three machines
- Account number hadoop
- Source path/opt/software/hadoop-2.7.3.tar.gz
- Target path/opt/hadoop->/opt/hadoop-2.7.3
- Dependency relationship zookeeper
2. Installation process
1. Switch to hadoop account and decompress hadoop to the destination installation directory by tar-zxvf command:
[root@test opt]# su hadoop [hadoop@test opt]$ cd /opt/software [hadoop@test software]$ tar -zxvf hadoop-${version}.tar.gz -C /opt [hadoop@test software]$ cd /opt [hadoop@test opt]$ ln -s /opt/hadoop-${version} /opt/hadoop
2. Create tmpdir directory:
[hadoop@test opt]$ cd /opt/hadoop [hadoop@test hadoop]$ mkdir -p tmpdir
Configure hadoop-env.sh file:
[hadoop@test hadoop]$ cd /opt/hadoop/etc/hadoop/ [hadoop@test hadoop]$ mkdir -p /opt/hadoop/pids [hadoop@test hadoop]$ vim hadoop-env.sh
Add the following configuration to the hadoop-env.sh file:
export JAVA_HOME=/opt/java export HADOOP_PID_DIR=/opt/hadoop/pids
4. Configure mapred-env.sh file:
[hadoop@test hadoop]$ cd /opt/hadoop/etc/hadoop/ [hadoop@test hadoop]$ vim mapred-env.sh
Add the following configuration to the mapred-env.sh file:
export JAVA_HOME=/opt/java
5. Configure the core-site.xml file core-site.xml
[hadoop@test hadoop]$ cd /opt/hadoop/etc/hadoop/ [hadoop@test hadoop]$ vim core-site.xml
Add the following configuration to the core-site.xml file:
<configuration> <property> //Temporary working directory of namenode <name>hadoop.tmp.dir</name> <value>/opt/hadoop/tmpdir</value> </property> <property> //The entry to hdfs tells namenode what the port number is on that machine. <name>fs.defaultFS</name> <value>hdfs://test:8020</value> </property> <property> <name>io.file.buffer.size</name> <value>131072</value> </property> <property> <name>fs.trash.interval</name> <value>1440</value> </property> </configuration>
Configure hdfs-site.xml file hdfs-site.xml
If rnager has not been installed at the time of installation, the following code needs to be commented out in the file.
<property> <name>dfs.namenode.inode.attributes.provider.class</name> <value>org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer</value> </property>
[hadoop@test hadoop]$ cd /opt/hadoop/etc/hadoop/ [hadoop@test hadoop]$ vim hdfs-site.xml
Add the following configuration to the hdfs-site.xml file:
<configuration> <property> #The number of replicas is generally less than or equal to the number of datanode s. <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/opt/hadoop/data/namenode</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/opt/hadoop/data/datanode</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> <property> <name>dfs.secondary.http.address</name> <value>test:50090</value> </property> </configuration>
Configure mapred-site.xml file mapred-site.xml
[hadoop@test hadoop]$ cd /opt/hadoop/etc/hadoop/ [hadoop@test hadoop]$ vim mapred-site.xml
Add the following configuration to the mapred-site.xml file:
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>test:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>test:19888</value> </property> </configuration>
Configure yarn-site.xml file: yarn-site.xml
[hadoop@test hadoop]$ cd /opt/hadoop/etc/hadoop/ [hadoop@test hadoop]$ vim yarn-site.xml
Add the following configuration to the yarn-site.xml file:
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>test:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>test:8031</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>test:8032</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>test:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>test:8088</value> </property> <!-- Site specific YARN configuration properties --> </configuration>
9. Configure the environment variables that hadoop runs
[hadoop@test hadoop]$ vim /etc/profile export HADOOP_HOME=/opt/hadoop export PATH=$HADOOP_HOME/bin:$PATH
When the configuration is successful, execute source/etc/profile to make the configuration effective
[hadoop@test hadoop]$ source /etc/profile
10. Modify the slaves file:
[hadoop@test hadoop]$ cd /opt/hadoop/etc/hadoop [hadoop@test hadoop]$ vim slaves
Add in the slaves file
//Location of nodes in datanode test2 test3
11. Copy hadoop-2.7.3 to hadoop@test2 and hadoop@test2 machines on test and modify the environment variables in step 9 and perform the following actions:
[hadoop@test hadoop]$ scp -r /opt/hadoop-${version} hadoop@test2:/opt/ [hadoop@test hadoop]$ ln -s /opt/hadoop-${version} /opt/hadoop [hadoop@test hadoop]$ scp -r /opt/hadoop-${version} hadoop@test3:/opt/ [hadoop@test hadoop]$ ln -s /opt/hadoop-${version} /opt/hadoop
12. Format namenode (only the first startup needs to be formatted!) Start hadoop and start the job history service:
# Format namenode, only the first start needs to be formatted!! [hadoop@test hadoop]$ hadoop namenode -format # start-up [hadoop@test hadoop]$ ${HADOOP_HOME}/sbin/start-all.sh [hadoop@test hadoop]$ ${HADOOP_HOME}/sbin/mr-jobhistory-daemon.sh start historyserver
start-all.sh includes two modules, dfs and yarn. They are start-dfs.sh. , start-yarn.sh So dfs and yarn can be started separately.
Note: If the datanode is not started, see if there is dirty data in tmpdir, delete the directory and delete the other two machines.
13. Check the service of each machine, and input jps on test, test2 and test3 machines respectively:
[hadoop@test ~]$ jps 24429 Jps 22898 ResourceManager 24383 JobHistoryServer 22722 SecondaryNameNode 22488 NameNode [ahdoop@test2 ~]$ jps 7650 DataNode 7788 NodeManager 8018 Jps [hadoop@test3 ~]$ jps 28407 Jps 28038 DataNode 28178 NodeManager
If the three machines output the above content normally, the service of hadoop cluster is working normally.
Access the hadoop service page: Enter the following address in the browser: http://172.24.5.173:8088
Run a simple mr program to verify that the cluster is installed successfully
[hadoop@test mapreduce]$ cd /opt/hadoop/share/hadoop/mapreduce [hadoop@test mapreduce]$ hadoop jar hadoop-mapreduce-examples-2.7.3.jar pi 2 4 Number of Maps = 2 Samples per Map = 4 Wrote input for Map #0 Wrote input for Map #1 Starting Job 17/04/06 09:36:47 INFO client.RMProxy: Connecting to ResourceManager at test/172.24.5.173:8032 17/04/06 09:36:47 INFO input.FileInputFormat: Total input paths to process : 2 17/04/06 09:36:48 INFO mapreduce.JobSubmitter: number of splits:2 17/04/06 09:36:48 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1491470782060_0001 17/04/06 09:36:48 INFO impl.YarnClientImpl: Submitted application application_1491470782060_0001 17/04/06 09:36:48 INFO mapreduce.Job: The url to track the job: http://test:8088/proxy/application_1491470782060_0001/ 17/04/06 09:36:48 INFO mapreduce.Job: Running job: job_1491470782060_0001 17/04/06 09:36:56 INFO mapreduce.Job: Job job_1491470782060_0001 running in uber mode : false 17/04/06 09:36:56 INFO mapreduce.Job: map 0% reduce 0% 17/04/06 09:37:00 INFO mapreduce.Job: map 50% reduce 0% 17/04/06 09:37:02 INFO mapreduce.Job: map 100% reduce 0% 17/04/06 09:37:08 INFO mapreduce.Job: map 100% reduce 100% 17/04/06 09:37:08 INFO mapreduce.Job: Job job_1491470782060_0001 completed successfully 17/04/06 09:37:08 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=50 FILE: Number of bytes written=357588 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=554 HDFS: Number of bytes written=215 HDFS: Number of read operations=11 HDFS: Number of large read operations=0 HDFS: Number of write operations=3 Job Counters Launched map tasks=2 Launched reduce tasks=1 Data-local map tasks=2 Total time spent by all maps in occupied slots (ms)=6118 Total time spent by all reduces in occupied slots (ms)=4004 Total time spent by all map tasks (ms)=6118 Total time spent by all reduce tasks (ms)=4004 Total vcore-milliseconds taken by all map tasks=6118 Total vcore-milliseconds taken by all reduce tasks=4004 Total megabyte-milliseconds taken by all map tasks=6264832 Total megabyte-milliseconds taken by all reduce tasks=4100096 Map-Reduce Framework Map input records=2 Map output records=4 Map output bytes=36 Map output materialized bytes=56 Input split bytes=318 Combine input records=0 Combine output records=0 Reduce input groups=2 Reduce shuffle bytes=56 Reduce input records=4 Reduce output records=0 Spilled Records=8 Shuffled Maps =2 Failed Shuffles=0 Merged Map outputs=2 GC time elapsed (ms)=213 CPU time spent (ms)=2340 Physical memory (bytes) snapshot=713646080 Virtual memory (bytes) snapshot=6332133376 Total committed heap usage (bytes)=546308096 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=236 File Output Format Counters Bytes Written=97 Job Finished in 20.744 seconds Estimated value of Pi is 3.50000000000000000000
Q&A
Q: Can stop-all.sh not stop hadoop clusters?
A: Because the information of hadoop process is stored in tmp, TMP will be emptied regularly
Q: unable to start namenode
A: Name node value in core-site.xml cannot be underlined!!!
Core elements of hadoop
- node
- namenode
Storage meta-information
- namenode
- manage
-
nodemanage
1. Managing computing functions in a single node
2. Maintain communication with Resource Manger and Application Master
3. Manage the life cycle of containers, monitor the resource usage of each container (memory, CPU, track node health, management log)
-