Big data Flume custom type

1. Customize the Interceptor 1.1 case requirements When Flume is used to collect the local logs of the server, different types of logs need to be sent to different analysis systems according to different log types. 1.2 demand analysis: Interceptor and Multiplexing ChannelSelector cases In actual development, there may be many types of ...

Posted by nainil on Fri, 26 Nov 2021 08:12:12 -0800

Figure calculation: Processing hierarchical data using Spark Graphx Pregel API

Today, distributed computing engines are the backbone of many analysis, batch, and streaming applications. Spark provides many advanced functions (pivot, analysis window function, etc.) to convert data out of the box. Sometimes you need to process hierarchical data or perform hierarchical calculations. Many database vendors provide functions su ...

Posted by newyear498 on Thu, 25 Nov 2021 15:14:49 -0800

SparkStreaming reads the Kafka data source and writes it to the Mysql database

SparkStreaming reads the Kafka data source and writes it to the Mysql database 1, Experimental environment The tools used in this experiment are kafka_2.11-0.11.0.2; zookeeper-3.4.5; spark-2.4.8; Idea; MySQL5.7 What is zookeeper? Zookeeper mainly serves distributed services, which can be used for unified configuration managemen ...

Posted by Nuv on Wed, 24 Nov 2021 01:25:33 -0800

spark - the source of all things WordCount

N methods of implementing WordCount in Spark    Hello, everyone. I won't introduce myself here. Let's talk about WordCount, that is, word frequency. You may learn from various channels that data processing will bear the brunt of WordCount. Why? Because WordCount is simple. But it can well describe data processing and data stati ...

Posted by mash on Mon, 22 Nov 2021 11:21:45 -0800

Spark read csv file operation, option parameter explanation

  import com.bean.Yyds1 import org.apache.spark.sql.SparkSession object TestReadCSV { def main(args: Array[String]): Unit = { val spark = SparkSession.builder() .appName("CSV Reader") .master("local") .getOrCreate() /** * Parameters can be either strings or specific types, such as boolean * delimiter Separat ...

Posted by George Botley on Sun, 21 Nov 2021 12:20:33 -0800

GitLab CI/CD automated build and release practice

Process introduction CI/CD is a method to frequently deliver applications to customers by introducing automation in the application development phase. The core concepts of CI/CD are continuous integration, continuous delivery and continuous deployment. In this article, I will introduce the practice of automated build and release based on GitLa ...

Posted by SwarleyAUS on Sun, 21 Nov 2021 11:12:41 -0800

Actual combat of e-commerce offline warehouse project

Actual combat of e-commerce offline warehouse project (Part 2) E-commerce analysis - core transactions 1, Business requirements Select indicators: order quantity, commodity quantity and payment amount, and analyze these indicators by sales region and commodity type. 2, Business database table structure 1. Relationship between databas ...

Posted by pengu on Sat, 20 Nov 2021 03:23:31 -0800

Big data development review Spark

11,spark 11.1 introduction to spark Apache Spark is a unified analysis and computing engine for large-scale data processing Based on memory computing, Spark improves the real-time performance of data processing in the big data environment, ensures high fault tolerance and high scalability, and allows users to deploy Spark on a large number o ...

Posted by Teddy B. on Thu, 18 Nov 2021 15:08:47 -0800

[Spark] 03 Spark Running Environment

1. Local mode The so-called Local mode is an environment in which Spark code can be executed locally without any additional node resourcesCommonly used for teaching, debugging, presentations, etc. The environment in which code was previously run in IDEA is called a development environment, which is different. 1.1 Installation Configurati ...

Posted by n8r0x on Wed, 17 Nov 2021 09:11:53 -0800

Spark common RDD operators for big data development

Spark common RDD operators for big data development map map passes in a piece of data and returns a piece of data Map is to perform function operations on the elements in the RDD one by one and map them to another RDD, Each data item in an RDD is transformed into a new element through the function mapping in the map. Input partition and o ...

Posted by axo on Tue, 09 Nov 2021 10:56:02 -0800