Big data Flume custom type
1. Customize the Interceptor
1.1 case requirements
When Flume is used to collect the local logs of the server, different types of logs need to be sent to different analysis systems according to different log types.
1.2 demand analysis: Interceptor and Multiplexing ChannelSelector cases
In actual development, there may be many types of ...
Posted by nainil on Fri, 26 Nov 2021 08:12:12 -0800
Figure calculation: Processing hierarchical data using Spark Graphx Pregel API
Today, distributed computing engines are the backbone of many analysis, batch, and streaming applications. Spark provides many advanced functions (pivot, analysis window function, etc.) to convert data out of the box. Sometimes you need to process hierarchical data or perform hierarchical calculations. Many database vendors provide functions su ...
Posted by newyear498 on Thu, 25 Nov 2021 15:14:49 -0800
SparkStreaming reads the Kafka data source and writes it to the Mysql database
SparkStreaming reads the Kafka data source and writes it to the Mysql database
1, Experimental environment
The tools used in this experiment are
kafka_2.11-0.11.0.2; zookeeper-3.4.5; spark-2.4.8; Idea; MySQL5.7
What is zookeeper?
Zookeeper mainly serves distributed services, which can be used for unified configuration managemen ...
Posted by Nuv on Wed, 24 Nov 2021 01:25:33 -0800
spark - the source of all things WordCount
N methods of implementing WordCount in Spark
Hello, everyone. I won't introduce myself here. Let's talk about WordCount, that is, word frequency. You may learn from various channels that data processing will bear the brunt of WordCount. Why? Because WordCount is simple. But it can well describe data processing and data stati ...
Posted by mash on Mon, 22 Nov 2021 11:21:45 -0800
Spark read csv file operation, option parameter explanation
import com.bean.Yyds1
import org.apache.spark.sql.SparkSession
object TestReadCSV {
def main(args: Array[String]): Unit = {
val spark = SparkSession.builder()
.appName("CSV Reader")
.master("local")
.getOrCreate()
/** * Parameters can be either strings or specific types, such as boolean
* delimiter Separat ...
Posted by George Botley on Sun, 21 Nov 2021 12:20:33 -0800
GitLab CI/CD automated build and release practice
Process introduction
CI/CD is a method to frequently deliver applications to customers by introducing automation in the application development phase. The core concepts of CI/CD are continuous integration, continuous delivery and continuous deployment. In this article, I will introduce the practice of automated build and release based on GitLa ...
Posted by SwarleyAUS on Sun, 21 Nov 2021 11:12:41 -0800
Actual combat of e-commerce offline warehouse project
Actual combat of e-commerce offline warehouse project (Part 2)
E-commerce analysis - core transactions
1, Business requirements
Select indicators: order quantity, commodity quantity and payment amount, and analyze these indicators by sales region and commodity type.
2, Business database table structure
1. Relationship between databas ...
Posted by pengu on Sat, 20 Nov 2021 03:23:31 -0800
Big data development review Spark
11,spark
11.1 introduction to spark
Apache Spark is a unified analysis and computing engine for large-scale data processing
Based on memory computing, Spark improves the real-time performance of data processing in the big data environment, ensures high fault tolerance and high scalability, and allows users to deploy Spark on a large number o ...
Posted by Teddy B. on Thu, 18 Nov 2021 15:08:47 -0800
[Spark] 03 Spark Running Environment
1. Local mode
The so-called Local mode is an environment in which Spark code can be executed locally without any additional node resourcesCommonly used for teaching, debugging, presentations, etc. The environment in which code was previously run in IDEA is called a development environment, which is different.
1.1 Installation Configurati ...
Posted by n8r0x on Wed, 17 Nov 2021 09:11:53 -0800
Spark common RDD operators for big data development
Spark common RDD operators for big data development
map
map passes in a piece of data and returns a piece of data Map is to perform function operations on the elements in the RDD one by one and map them to another RDD, Each data item in an RDD is transformed into a new element through the function mapping in the map. Input partition and o ...
Posted by axo on Tue, 09 Nov 2021 10:56:02 -0800