Hadoop Series-Hadoop Development Environment Construction
I. Pre-conditions
Hadoop runs on JDK and needs to be pre-installed. The installation steps are as follows:
Installation of JDK under Linux
Configuration of Secret-Free Login
The communication between Hadoop components needs to be based on SSH.
2.1 Configuration Mapping
Configure ip address and host name mapping:
vim /etc/hosts
# Increase at ...
Posted by wdallman on Mon, 16 Sep 2019 04:16:06 -0700
Filebeat->Logstash->Kafka Data Acquisition Channel Construction
Construction of Filebeat-> Logstash-> Kafka Data Acquisition Platform
brief introduction
demand
programme
Filebeat
Logstash
Unfinished Work
brief introduction
Because our company needs to collect data from various businesses, we need ...
Posted by msimonds on Sun, 15 Sep 2019 23:17:14 -0700
Big Data Series: Spark's Initial Knowledge of Learning Notes
1. Introduction to Spark
In 2009, Spark was born at AMPLab Laboratory at the University of Berkeley. Spark is an experimental project with very little code and is a lightweight framework.
In 2010, the University of Berkeley officially opened up th ...
Posted by Pozor on Wed, 11 Sep 2019 19:29:10 -0700
Structured Streaming Simple Data Processing - Read CSV and extract column keywords
Preface
Recently, when Baidu wants to learn Spark's newer Structured Streaming, all of them are monotonous wordcount s, which are quite speechless.You have to figure out for yourself what you can do with the Select and Filter operations of the Dataframe.Because of using Python, using Pandas, and trying to turn Pandas to process, readStream doe ...
Posted by bigphpn00b on Wed, 11 Sep 2019 16:56:03 -0700
Hive QL: Window Opening Function (Cumulative Statistics)
Catalog
Preface
1. What is a windowing function
2. Window-opening function syntax
3. Classification of window-opening functions
4. Cumulative Statistics Window Opening Function
4.1 Cumulative sum(xx) over
4.2 Cumulative Average avg(xx) ...
Posted by AdamSnow on Mon, 09 Sep 2019 18:12:47 -0700
Implementation of MapReduce Programming in Windows
Statistics of the number of credit card defaulters in a bank
csv download address
Breach of contract rule: AY_1-PAY_6:PAY_1 is the repayment situation in September 2005; PAY_2 is the repayment situation in August 2005;... PAY_6 is the repayment in ...
Posted by Kevmaster on Fri, 06 Sep 2019 07:02:24 -0700
Super simple centos7 configuration Hadoop 2.7.7 + flume 1.8.0 (including examples)
Super simple centos7 configuration Hadoop 2.7.7 + flume 1.8.0
Introduction of flume: https://blog.csdn.net/qq_40343117/article/details/100119574
1 - Download the installation package
Download address: http://www.apache.org/dist/flume/
Choose the ...
Posted by map200uk on Wed, 28 Aug 2019 04:57:15 -0700
Giraph Source Analysis - Statistics of the Number of Vertices Participating in each SuperStep
Author | Bai Song
Objective: In scientific research, it is necessary to analyze the number of vertices involved in each iteration to further optimize the system. For example, in the last line of SSP's compute() method, the current vertex voteToHalt is changed to an InActive state. So after each iteration, all vertices are in the InActive state. ...
Posted by sampledformat on Mon, 19 Aug 2019 20:53:20 -0700
Phase III API
Phase 3: API
One: What you need to focus on when looking at API s
1. Watching bags
2. Look at the explanation.
3. Look at the structure
4. Look at the method: Look at the explanation of the method, look at the modifier, then return the value, and see ...
Posted by kam_uoc on Sun, 18 Aug 2019 07:02:45 -0700
Apache Spark Progressive Learning Tutorial: Spark Cluster Deployment and Running
Catalog
First, Preface
1.1 Cluster Planning
1.2 Pre-condition
1.3 Installation Pack Download
II. Installation and deployment
2.1. Unzip and modify configuration files
2.2 Copy files to two other machines
3. Operation and testing
3.1 Start Cluster
3.2 Start spark-shell connection cluster
3. ...
Posted by zuhalter223 on Fri, 02 Aug 2019 02:32:40 -0700