Hadoop Series-Hadoop Development Environment Construction

I. Pre-conditions Hadoop runs on JDK and needs to be pre-installed. The installation steps are as follows: Installation of JDK under Linux Configuration of Secret-Free Login The communication between Hadoop components needs to be based on SSH. 2.1 Configuration Mapping Configure ip address and host name mapping: vim /etc/hosts # Increase at ...

Posted by wdallman on Mon, 16 Sep 2019 04:16:06 -0700

Filebeat->Logstash->Kafka Data Acquisition Channel Construction

Construction of Filebeat-> Logstash-> Kafka Data Acquisition Platform brief introduction demand programme Filebeat Logstash Unfinished Work brief introduction Because our company needs to collect data from various businesses, we need ...

Posted by msimonds on Sun, 15 Sep 2019 23:17:14 -0700

Big Data Series: Spark's Initial Knowledge of Learning Notes

1. Introduction to Spark In 2009, Spark was born at AMPLab Laboratory at the University of Berkeley. Spark is an experimental project with very little code and is a lightweight framework. In 2010, the University of Berkeley officially opened up th ...

Posted by Pozor on Wed, 11 Sep 2019 19:29:10 -0700

Structured Streaming Simple Data Processing - Read CSV and extract column keywords

Preface Recently, when Baidu wants to learn Spark's newer Structured Streaming, all of them are monotonous wordcount s, which are quite speechless.You have to figure out for yourself what you can do with the Select and Filter operations of the Dataframe.Because of using Python, using Pandas, and trying to turn Pandas to process, readStream doe ...

Posted by bigphpn00b on Wed, 11 Sep 2019 16:56:03 -0700

Hive QL: Window Opening Function (Cumulative Statistics)

Catalog   Preface 1. What is a windowing function 2. Window-opening function syntax 3. Classification of window-opening functions 4. Cumulative Statistics Window Opening Function 4.1 Cumulative sum(xx) over 4.2 Cumulative Average avg(xx) ...

Posted by AdamSnow on Mon, 09 Sep 2019 18:12:47 -0700

Implementation of MapReduce Programming in Windows

Statistics of the number of credit card defaulters in a bank csv download address Breach of contract rule: AY_1-PAY_6:PAY_1 is the repayment situation in September 2005; PAY_2 is the repayment situation in August 2005;... PAY_6 is the repayment in ...

Posted by Kevmaster on Fri, 06 Sep 2019 07:02:24 -0700

Super simple centos7 configuration Hadoop 2.7.7 + flume 1.8.0 (including examples)

Super simple centos7 configuration Hadoop 2.7.7 + flume 1.8.0 Introduction of flume: https://blog.csdn.net/qq_40343117/article/details/100119574 1 - Download the installation package Download address: http://www.apache.org/dist/flume/ Choose the ...

Posted by map200uk on Wed, 28 Aug 2019 04:57:15 -0700

Giraph Source Analysis - Statistics of the Number of Vertices Participating in each SuperStep

Author | Bai Song Objective: In scientific research, it is necessary to analyze the number of vertices involved in each iteration to further optimize the system. For example, in the last line of SSP's compute() method, the current vertex voteToHalt is changed to an InActive state. So after each iteration, all vertices are in the InActive state. ...

Posted by sampledformat on Mon, 19 Aug 2019 20:53:20 -0700

Phase III API

Phase 3: API One: What you need to focus on when looking at API s 1. Watching bags 2. Look at the explanation. 3. Look at the structure 4. Look at the method: Look at the explanation of the method, look at the modifier, then return the value, and see ...

Posted by kam_uoc on Sun, 18 Aug 2019 07:02:45 -0700

Apache Spark Progressive Learning Tutorial: Spark Cluster Deployment and Running

Catalog First, Preface 1.1 Cluster Planning 1.2 Pre-condition 1.3 Installation Pack Download II. Installation and deployment 2.1. Unzip and modify configuration files 2.2 Copy files to two other machines 3. Operation and testing 3.1 Start Cluster 3.2 Start spark-shell connection cluster 3. ...

Posted by zuhalter223 on Fri, 02 Aug 2019 02:32:40 -0700