Process Function for Flink Learning Notes

It is expected to take a week to complete this article, and to study every aspect of the basic entry provided by the Flink documentation (for the purpose of participating in Ali's Flink Programming Competition): Figure Flink provides different ...

Posted by kulikedat on Fri, 16 Aug 2019 20:07:04 -0700

Spark Learning Instance (Python): Load Data Source loads the data source

When we use Spark, we mainly use it to process large quantities of data quickly. So what data sources will we have in actual development and production? I summarize them as follows: text csv json parquet jdbc hive kafka elasticsearch Next, all the tests are based on the spark local mode, be ...

Posted by Revlet on Thu, 08 Aug 2019 23:40:57 -0700

Chapter 2 RDD Programming (2.1-2.2)

Chapter 2 RDD Programming 2.1 Programming Model In Spark, RDDs are represented as objects that are converted through method calls on objects.After a series of transformations define the RDD, actions can be invoked to trigger RDD calculations, either by returning results to the application (count, collect, etc.) or by saving data to the stor ...

Posted by Chizzad on Sun, 04 Aug 2019 10:52:20 -0700

Apache Spark Progressive Learning Tutorial: Spark Cluster Deployment and Running

Catalog First, Preface 1.1 Cluster Planning 1.2 Pre-condition 1.3 Installation Pack Download II. Installation and deployment 2.1. Unzip and modify configuration files 2.2 Copy files to two other machines 3. Operation and testing 3.1 Start Cluster 3.2 Start spark-shell connection cluster 3. ...

Posted by zuhalter223 on Fri, 02 Aug 2019 02:32:40 -0700

Application of Scrapy and MongoDB

Links to the original text: http://www.cnblogs.com/JackQ/p/4843701.html Scrapy is a fast, high-level screen capture and Web Capture framework developed by Python. It is used to capture Web sites and extract structured data from pages. ...

Posted by pollysal on Tue, 30 Jul 2019 18:00:21 -0700

Apache Spark Progressive Learning Tutorial: Spark Single Node Installation and Quick Start Demo

First, download Spark The first step in using Spark is to download and decompress. Let's start by downloading the precompiled version of Spark. Visits http://spark.apache.org/downloads.html To download the spark installation package. The version used in this article is: spark-2.4.3-bin-hadoop2.7.tgz Second, install Spark cd ~ tar -xf spark ...

Posted by culprit on Mon, 29 Jul 2019 04:21:08 -0700

ROS One-Click Deployment of Spark Distributed Cluster

Apache Spark is a fast and versatile computing engine designed for large-scale data processing. It can perform a wide range of operations, including SQL queries, text processing, machine learning, etc. Before the advent of Spark, we generally needed to learn a variety of engines to handle these requirements separately.The main purpose of this a ...

Posted by scottb1 on Mon, 08 Jul 2019 09:48:51 -0700

Summary of kafka learning knowledge points

I. server.properties in kafka configuration file #broker's global unique number, not duplicated broker.id=0 #The port used to listen for links, where producer or consumer will establish a connection port=9092 #Number of threads processing network requests num.network.threads=3 #Off-the-shelf quantities used to process disk IO nu ...

Posted by fellixombc on Thu, 27 Jun 2019 14:32:08 -0700

spark examples

Spark Streaming is a quasi-real-time stream processing framework. The processing response time is usually in minutes, that is to say, the delay time of processing real-time data is in seconds; Storm is a real-time stream processing framework, and the processing response is in milliseconds. So the selection of flow framework depends on the spec ...

Posted by Frederick on Fri, 21 Jun 2019 14:37:32 -0700

Spark SQL Learning Notes

Spark SQL is a module for processing structured data in Spark. Unlike the underlying Park RDD API, the Spark SQL interface provides more information about the structure of data and the runtime of computing tasks. Spark SQL now has three different APIs: the SQL statement, the DataFrame API and the latest Dataset API.One use of Spark SQL is to ex ...

Posted by arunmj82 on Sun, 16 Jun 2019 17:27:38 -0700