Special symbols commonly used in Scala

1. = > anonymous function In Spark, a function is also an object that can be assigned to a variable. Format of Spark's anonymous function definition: ==(parameter list) = > {function body}== Therefore, the function of = > is to create an anonymous function instance. For example: (X: int) = > x + 1 2. < - (set traversal) Loop trav ...

Posted by ctimmer on Thu, 12 Mar 2020 04:54:52 -0700

Spark -- Transformation operator

Article directory Transformation operator Basic operator 1. map(func) 2. filter(func) 3. flatMap 4. Set operation (union, intersection, distinct) 5. Grouping (groupByKey, reduceByKey, cogroup) 6. Sorting (sortBy, sortByKey) Advanced operator 1. mapPartitionsWithIndex(func) 2. aggregate 3. aggreg ...

Posted by brashquido on Thu, 12 Mar 2020 01:07:43 -0700

Source code learning of BeanFactoryPostProcessor and BeanPostProcessor of Spring hook

BeanFactoryPostProcessor and BeanPostProcessor are two interfaces that are exposed when initializing beans. They are similar to Aware (PS: for spring hook, please see) Detailed explanation of Spring hook method and hook interface This article also mainly studies the details of the specific hooks, so that we can be efficient in the actual ...

Posted by webhamster on Sun, 08 Mar 2020 22:36:17 -0700

RDD common operations in pyspark

preparation: import pyspark from pyspark import SparkContext from pyspark import SparkConf conf=SparkConf().setAppName("lg").setMaster('local[4]') #local[4] means to run 4 kernels locally sc=SparkContext.getOrCreate(conf) 1. Parallel and collect The parallelize function converts the list obj ...

Posted by moomsdad on Fri, 21 Feb 2020 02:13:19 -0800

MasterSlave cluster deployment of ActiveMQ high availability solution (HA)

In the previous documents, we demonstrated how to use shared files and shared databases to realize the cluster of activemq. See also MasterSlave cluster deployment of ActiveMQ high availability solution (HA) (I) In this section, we demonstrate how to implement clustering through leveldb + zookeeper. O ...

Posted by gjdunga on Fri, 14 Feb 2020 05:31:00 -0800

Scala learning day 1: Variables

Learning objectives Grammatical format Define a variable in the interpreter val and var variables Use type inference to define variables Lazy assignment Grammatical format Java variable definition int a = 0; In scala, you can use val or var to define variables. The syntax format is as follows: ...

Posted by thor erik on Sun, 09 Feb 2020 02:40:44 -0800

maven common plug-ins

1, Maven compiler plugin 1. It is used to set the jdk version used when maven is packaged. maven is a java framework, so it is only for jdk. scala needs to set it separately. 2.usage 2.1. Setting plug-ins <plugin> ...

Posted by Rohan Shenoy on Sun, 09 Feb 2020 00:46:26 -0800

Find the number of adjacent words in large amount of data

This topic is similar to some of the search topics in Leetcode. The problem you want to deal with is: count the number of two adjacent digits of a word. If there are w1,w2,w3,w4,w5,w6, then: The final output is (word,neighbor,frequency). We implement it in five ways: MapReduce Spark Spark SQL method Scala method Spark SQL for Scala MapReduce ...

Posted by olechka on Sun, 02 Feb 2020 08:18:59 -0800

Spark SQL/DataFrame/DataSet operation ----- read data

1, Read data source (1) Read json and use spark.read. Note: the path is from HDFS by default. If you want to read the native file, you need to prefix it file: / /, as follows scala> val people = spark.read.format("json").load("file:///opt/software/data/people.json") people: org.apache.spark.sql.DataFrame = [age: bigint, name: string] scal ...

Posted by Pie on Sun, 02 Feb 2020 08:18:33 -0800

Flink of big data learning

Catalog   1: Introduction 2: Why Flink 3: What industries need 4: Features of Flink 5: The difference with sparkStreaming 6: Preliminary development 7: Flink configuration description Eight: Environment 9: Running components 1: Introduction Flink is a framework and distributed com ...

Posted by stodge on Fri, 17 Jan 2020 01:18:24 -0800