Hadoop running environment building tutorial
1. Environmental preparation:
A linux machine can be a virtual machine installed by local VMware or a real linux machine.
If it is a locally installed virtual machine, the following points need to be pre configured:
Configure the static IP of the machine (to prevent IP changes during restart)
Modify host name (easy to configure)
Turn off the ...
Posted by crazylegseddie on Tue, 25 Feb 2020 19:02:32 -0800
Install Sqoop on Linux (and connect mysql test)
Article directory
Environment description
Download and unzip
Change configuration sqoop-env.sh
After decompression
Modify sqoop-env.sh
Configure environment variables
Copy mysql driver
mysql start
View sqoop version
Test with mysql
Environment description
Software
Edition
operating syste ...
Posted by possiblyB9 on Tue, 25 Feb 2020 07:36:16 -0800
Shell operation of HDFS
1. Basic grammar
bin/hadoop fs specific command or bin / HDFS DFS specific command
2. Command Daquan
[andy@xiaoai01 hadoop-2.7.2]$ bin/hadoop fs
[-appendToFile <localsrc> ... <dst>]
[-cat [-ignoreCrc] <src> ...]
[-checksum <src> ...]
[-chgrp [-R] GROUP PATH...]
[-chmod [-R ...
Posted by Abarak on Fri, 21 Feb 2020 03:07:18 -0800
RDD common operations in pyspark
preparation:
import pyspark
from pyspark import SparkContext
from pyspark import SparkConf
conf=SparkConf().setAppName("lg").setMaster('local[4]') #local[4] means to run 4 kernels locally
sc=SparkContext.getOrCreate(conf)
1. Parallel and collect
The parallelize function converts the list obj ...
Posted by moomsdad on Fri, 21 Feb 2020 02:13:19 -0800
Seckill TOPK problem (with code)
subject
Given a large file (1T? 10T), in which each line stores a user's ID (IP? IQ?) , your computer has only 2G memory, please find the ten IDS with the highest frequency
introduce
In recent years, the TopK problem has been the most, the most and the most in the field test
In fact, the answer is r ...
Posted by hughmill on Mon, 17 Feb 2020 20:07:30 -0800
Flink1.9“Error: A JNI error has occurred”
background
Recently, companies are building big data systems, and architects recommend using Flink to build them.So this day, I'm investigating Flink in my own virtual machine environment (Ubuntu 16.4).
from ververica I learned the fundamentals of flink in school, because I worked on python data pr ...
Posted by aperales10 on Sun, 16 Feb 2020 18:33:05 -0800
Bug0: resolve java.io.FileNotFoundException error encountered when Hadoop plug-in runs
Problem description
The following error occurred during running WordCount program in Eclipse with plug-ins (instead of manually packing and uploading servers):
DEBUG - LocalFetcher 1 going to fetch: attempt_local938878567_0001_m_000000_0
WARN - job_local938878567_0001
java.lang.Exception: org.apache. ...
Posted by jeffshead on Wed, 12 Feb 2020 06:32:57 -0800
How to quickly build a Spark distributed architecture for big data
Build our Spark platform from scratch
1. Preparing the centeros environment
In order to build a real cluster environment and achieve a highly available architecture, we should prepare at least three virtual machines as cluster nodes. So I bought three Alibaba cloud servers as our cluster nodes.
...
Posted by knelson on Tue, 04 Feb 2020 23:47:57 -0800
Troubleshooting of namenode in HDFS
After NameNode fails, you can use the following two methods to recover data:
1. Copy data in SecondaryNameNode to the directory where NameNode stores data
(1) kill -9 NameNode process
[test@hadoop151 ~]$ jps
3764 DataNode
4069 NodeManager
3654 NameNode
7738 Jps
[test@hadoop151 ~]$ kill -9 3654
[test@h ...
Posted by webtechatlantic on Mon, 03 Feb 2020 08:51:44 -0800
centos7 hadoop+hive installation
Prepare four virtual machines
Virtual Machine Installation
1.Create a new virtual machine
2.Click on Typical Installation(Recommend)
3.Select Chinese and click on your own partition
# Partition Configuration (JD Usage)
/boot 200M
swap 512M # Not enough native memory, swap
/ # root directory
4.Configu ...
Posted by Derek on Fri, 31 Jan 2020 20:33:17 -0800