Special symbols commonly used in Scala
1. = > anonymous function
In Spark, a function is also an object that can be assigned to a variable.
Format of Spark's anonymous function definition:
==(parameter list) = > {function body}==
Therefore, the function of = > is to create an anonymous function instance.
For example: (X: int) = > x + 1
2. < - (set traversal)
Loop trav ...
Posted by ctimmer on Thu, 12 Mar 2020 04:54:52 -0700
Spark -- Transformation operator
Article directory
Transformation operator
Basic operator
1. map(func)
2. filter(func)
3. flatMap
4. Set operation (union, intersection, distinct)
5. Grouping (groupByKey, reduceByKey, cogroup)
6. Sorting (sortBy, sortByKey)
Advanced operator
1. mapPartitionsWithIndex(func)
2. aggregate
3. aggreg ...
Posted by brashquido on Thu, 12 Mar 2020 01:07:43 -0700
Source code learning of BeanFactoryPostProcessor and BeanPostProcessor of Spring hook
BeanFactoryPostProcessor and BeanPostProcessor are two interfaces that are exposed when initializing beans. They are similar to Aware (PS: for spring hook, please see) Detailed explanation of Spring hook method and hook interface This article also mainly studies the details of the specific hooks, so that we can be efficient in the actual ...
Posted by webhamster on Sun, 08 Mar 2020 22:36:17 -0700
RDD common operations in pyspark
preparation:
import pyspark
from pyspark import SparkContext
from pyspark import SparkConf
conf=SparkConf().setAppName("lg").setMaster('local[4]') #local[4] means to run 4 kernels locally
sc=SparkContext.getOrCreate(conf)
1. Parallel and collect
The parallelize function converts the list obj ...
Posted by moomsdad on Fri, 21 Feb 2020 02:13:19 -0800
MasterSlave cluster deployment of ActiveMQ high availability solution (HA)
In the previous documents, we demonstrated how to use shared files and shared databases to realize the cluster of activemq. See also MasterSlave cluster deployment of ActiveMQ high availability solution (HA) (I)
In this section, we demonstrate how to implement clustering through leveldb + zookeeper.
O ...
Posted by gjdunga on Fri, 14 Feb 2020 05:31:00 -0800
Scala learning day 1: Variables
Learning objectives
Grammatical format
Define a variable in the interpreter
val and var variables
Use type inference to define variables
Lazy assignment
Grammatical format
Java variable definition
int a = 0;
In scala, you can use val or var to define variables. The syntax format is as follows: ...
Posted by thor erik on Sun, 09 Feb 2020 02:40:44 -0800
maven common plug-ins
1, Maven compiler plugin
1. It is used to set the jdk version used when maven is packaged. maven is a java framework, so it is only for jdk. scala needs to set it separately.
2.usage
2.1. Setting plug-ins
<plugin> ...
Posted by Rohan Shenoy on Sun, 09 Feb 2020 00:46:26 -0800
Find the number of adjacent words in large amount of data
This topic is similar to some of the search topics in Leetcode.
The problem you want to deal with is: count the number of two adjacent digits of a word. If there are w1,w2,w3,w4,w5,w6, then:
The final output is (word,neighbor,frequency).
We implement it in five ways:
MapReduce
Spark
Spark SQL method
Scala method
Spark SQL for Scala
MapReduce ...
Posted by olechka on Sun, 02 Feb 2020 08:18:59 -0800
Spark SQL/DataFrame/DataSet operation ----- read data
1, Read data source
(1) Read json and use spark.read. Note: the path is from HDFS by default. If you want to read the native file, you need to prefix it file: / /, as follows
scala> val people = spark.read.format("json").load("file:///opt/software/data/people.json")
people: org.apache.spark.sql.DataFrame = [age: bigint, name: string]
scal ...
Posted by Pie on Sun, 02 Feb 2020 08:18:33 -0800
Flink of big data learning
Catalog
1: Introduction
2: Why Flink
3: What industries need
4: Features of Flink
5: The difference with sparkStreaming
6: Preliminary development
7: Flink configuration description
Eight: Environment
9: Running components
1: Introduction
Flink is a framework and distributed com ...
Posted by stodge on Fri, 17 Jan 2020 01:18:24 -0800