Spark Streaming integrates flume(Poll and Push)

As a framework of log real-time collection, flume can be connected with SparkStreaming real-time processing framework. Flume generates data in real-time and sparkStreaming does real-time processing. Spark Streaming docks with FlumeNG in two ways: one is that FlumeNG pushes the message Push to Spark Streaming, the other is that S ...

Posted by kobayashi_one on Wed, 30 Jan 2019 17:18:15 -0800

spark-streaming sample program

Develop spark-streaming to receive data worldcount from server port in real time. Environment building idea+maven's pom file is as follows: <?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLoc ...

Posted by phpnew on Wed, 30 Jan 2019 12:21:15 -0800

Spark Exercises: Seeking TopN for Teachers of All Subjects

[Note] This article refers to learning videos from Calf School. Spark Exercises: Seeking TopN for Teachers of All Subjects Data format: http://bigdata.edu360.cn/laozhang 1. Data segmentation val func=(line:String)=>{   val index=line.lastIndexOf("/")   val teacher=line.substring(index+1)   val httpHost=line.substrin ...

Posted by SureFire on Tue, 29 Jan 2019 23:33:14 -0800

SparkSQL View Debugging Generated Code

Spark SQL (DataFrame) is introduced in websites and some books to generate final running statements based on the corresponding operations. This article starts with a simple, low-level problem and ends with a look at the generated code to find the root cause of the problem and a brief introduction to how to debug SparkSQL. Sour ...

Posted by NikkiLoveGod on Tue, 29 Jan 2019 20:51:15 -0800

Akka-Cluster(0) - Some Ideas for Distributed Application Development

When I first came into contact with akka-cluster, I had a dream to make full use of the free distribution and independent operation of actors to achieve some kind of distributed program. The computational tasks of this program can be subdivided artificially and then assigned to actors distributed on multiple servers. These servers are all in th ...

Posted by Micah D on Sat, 26 Jan 2019 23:45:14 -0800

Spark Learning Notes (12) - SparkSQL

1 SparkSQL Introduction Spark SQL is a module Spark uses to process structured data. It provides a programming abstraction called DataFrame and serves as a distributed SQL query engine. Hive has been learned. It converts Hive SQL into MapReduce and submits it to cluster for execution. It greatly simplifies the complexity of pr ...

Posted by Dorin85 on Sat, 26 Jan 2019 00:24:15 -0800

[Big Data] Scala Quick Learning Manual 2

Scala Quick Learning Manual 2 Category 1, Object, Inheritance, Characteristics Category 1.1 Definition of Class 1 //In Scala, classes do not need to be declared public. //Scala source files can contain multiple classes, all of which have public visibility. class Person { //The variable modified with val is a read-only pr ...

Posted by blackcode on Fri, 25 Jan 2019 09:21:14 -0800

idea writes WordCount program under windows and uploads it to hadoop cluster by jar package (fool version)

Typically, programs are programmed in IDE, then packaged as jar packages, and submitted to the cluster. The most commonly used method is to create a Maven project to manage the dependencies of jar packages using Maven. 1. Generating jar packages for WordCount 1. Open IDEA, File New Project Maven Next Fill in Groupld and Artif ...

Posted by kcgame on Thu, 24 Jan 2019 19:45:14 -0800

maven Engineering Packing, Single Node Running wordcount (I)

spark shell is only used to test and validate our programs. In production environment, programs are usually programmed in IDE, then packaged into jar packages and submitted to the cluster. The most commonly used method is to create a Maven project to manage the dependencies of jar packages by Maven. First, edit Maven project on ...

Posted by volka on Sat, 19 Jan 2019 13:45:12 -0800

Akka (24): Stream: Control data flow from external system - control live stream from external system

In the real application scenario of data flow, the need of docking with external systems is often met. These external systems may be Actor systems or some other type of systems. Docking with these external systems means that data streams running in another thread can receive events pushed by the external system and respond to changes in behavio ...

Posted by KGodwin on Wed, 09 Jan 2019 14:36:10 -0800