Hadoop fully distributed deployment
I. overview
Concept:
It is a reliable, scalable and distributed open source software.
It is a framework that allows large data and distributed processing across clusters of computers, using a simple programming model (mapreduce)
It can be extended from a single server to thousands of hosts, and each node provides computing ...
Posted by spyke01 on Fri, 01 Feb 2019 03:57:15 -0800
Construction of Hadoop Cluster
Article directory
1. Basic information
2. Installation process
1. Switch to hadoop account and decompress hadoop to the destination installation directory by tar-zxvf command:
2. Create tmpdir directory:
Configure hadoop-env.sh file:
4. Configure mapred-env.sh file:
5. Configure the core-site.xml file core-site.xml
Configure ...
Posted by mohamdally on Thu, 31 Jan 2019 23:15:16 -0800
Large Data Notebook 06-YARN Construction and Case Study
YARN
The construction of yarn
Cluster planning
To configure
Test case
wordcount
Use the test case wordcount provided by MapReduce
The construction of yarn
Cluster planning
To configure
Modify the configuration file mapred-sitex.xml
<property>
<name>mapreduce.framework.name</name>
<value& ...
Posted by marli on Wed, 30 Jan 2019 11:00:15 -0800
Manual Creation of multiprocessing Multiprocess Distributed Crawler
multiprocessing Multiprocess Crawling Knows the User
Crawl content screenshots
ControlNode Control Node Part
NodeManger - Control Scheduler
MemberManger - Knowing User Manager
Data Output - Data Storage
SpiderNode crawler node section
SpiderWorker-Crawler Scheduler
Downloader - HTML Downloader
Parser - HTML parser
Cr ...
Posted by mdmann on Wed, 30 Jan 2019 02:48:16 -0800