1. Environmental preparation
On a linux machine, install the hadoop running environment. For the installation method, see: Establishment of HADOOP operation environment
2. Start HDFS and run MapReduce
2.1. Configure cluster
1. Configuration: hadoop-env.sh
Get the installation path of JDK in Linux system:
[root@ hadoop101 ~]# echo $JAVA_HOME
Posted by spooke2k on Tue, 25 Feb 2020 19:31:27 -0800
1. Environmental preparation:
A linux machine can be a virtual machine installed by local VMware or a real linux machine.
If it is a locally installed virtual machine, the following points need to be pre configured:
Configure the static IP of the machine (to prevent IP changes during restart)
Modify host name (easy to configure)
Turn off the ...
Posted by crazylegseddie on Tue, 25 Feb 2020 19:02:32 -0800
Download and unzip
Change configuration sqoop-env.sh
Configure environment variables
Copy mysql driver
View sqoop version
Test with mysql
operating syste ...
Posted by possiblyB9 on Tue, 25 Feb 2020 07:36:16 -0800
1. Basic grammar
bin/hadoop fs specific command or bin / HDFS DFS specific command
2. Command Daquan
[andy@xiaoai01 hadoop-2.7.2]$ bin/hadoop fs
[-appendToFile <localsrc> ... <dst>]
[-cat [-ignoreCrc] <src> ...]
[-checksum <src> ...]
[-chgrp [-R] GROUP PATH...]
[-chmod [-R ...
Posted by Abarak on Fri, 21 Feb 2020 03:07:18 -0800
from pyspark import SparkContext
from pyspark import SparkConf
conf=SparkConf().setAppName("lg").setMaster('local') #local means to run 4 kernels locally
1. Parallel and collect
The parallelize function converts the list obj ...
Posted by moomsdad on Fri, 21 Feb 2020 02:13:19 -0800
Given a large file (1T? 10T), in which each line stores a user's ID (IP? IQ?) , your computer has only 2G memory, please find the ten IDS with the highest frequency
In recent years, the TopK problem has been the most, the most and the most in the field test
In fact, the answer is r ...
Posted by hughmill on Mon, 17 Feb 2020 20:07:30 -0800
Recently, companies are building big data systems, and architects recommend using Flink to build them.So this day, I'm investigating Flink in my own virtual machine environment (Ubuntu 16.4).
from ververica I learned the fundamentals of flink in school, because I worked on python data pr ...
Posted by aperales10 on Sun, 16 Feb 2020 18:33:05 -0800
The following error occurred during running WordCount program in Eclipse with plug-ins (instead of manually packing and uploading servers):
DEBUG - LocalFetcher 1 going to fetch: attempt_local938878567_0001_m_000000_0
WARN - job_local938878567_0001
java.lang.Exception: org.apache. ...
Posted by jeffshead on Wed, 12 Feb 2020 06:32:57 -0800
Build our Spark platform from scratch
1. Preparing the centeros environment
In order to build a real cluster environment and achieve a highly available architecture, we should prepare at least three virtual machines as cluster nodes. So I bought three Alibaba cloud servers as our cluster nodes.
Posted by knelson on Tue, 04 Feb 2020 23:47:57 -0800
After NameNode fails, you can use the following two methods to recover data:
1. Copy data in SecondaryNameNode to the directory where NameNode stores data
(1) kill -9 NameNode process
[test@hadoop151 ~]$ jps
[test@hadoop151 ~]$ kill -9 3654
Posted by webtechatlantic on Mon, 03 Feb 2020 08:51:44 -0800