Scala defines arrays, enhances for traversing Scala arrays, unitl generates subscripts traversing Scala arrays, array conversion, array common algorithms 05
1 fixed length array and variable length array
Format of fixed length array definition:
val arr = new Array[T] (array length)
Variable length array definition format:
val arr = ArrayBuffer[T]()Note that you need to import the package: import scala.collection.mutable.ArrayBuffer
The code is as follows
import scala.colle ...
Posted by sashi34u on Sat, 16 Nov 2019 07:37:04 -0800
2. Principle and Use of spark--spark core
[TOC]
1. Some basic terms in spark
RDD: Elastically distributed datasets, the core focus of sparkOperators: Some functions for manipulating RDDapplication: user-written spark Program (DriverProgram + ExecutorProgram)job: an action class operator triggered operationstage: A set of tasks that divide a job into several stages based on dependencie ...
Posted by FeeBle on Fri, 15 Nov 2019 22:22:07 -0800
Troubleshooting Spark error -- Error initializing SparkContext
Spark reported an error when submitting the spark job
./spark-shell
19/05/14 05:37:40 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel) ...
Posted by motofzr1000 on Sun, 10 Nov 2019 08:02:46 -0800
Graphx processing janusGraph data implementation
Declarations:
This scheme is an alternative in case the implementation of gremlinSQL scheme is blocked by spark's direct execution. It does not involve working secrets, there is no possibility of leaking secrets, it is purely personal reflection, and I hope to make a contribution
Scheme:
Converts the query result of gremlinSql to startGraph, th ...
Posted by lovelys on Thu, 07 Nov 2019 09:49:15 -0800
Spark SQL uses beeline to access hive warehouse
I. add hive-site.xml
Add the hive-site.xml configuration file under $SPARK_HOME/conf in order to access hive metadata normally
vim hive-site.xml
<configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://192.168.1.201:3306/hiveDB?createDatabaseIfNotExist=true ...
Posted by mrodrigues on Wed, 06 Nov 2019 14:06:19 -0800
The windowing function of Spark
I. Introduction
The window function row ou number() is a function that groups one field and then takes the first values sorted by another field, which is equivalent to grouping topN. If a windowing function is used in the SQL statement, the SQL statement must be executed with HiveContext.
II. Code practice [use HiveContext]
package big.dat ...
Posted by nezbo on Fri, 01 Nov 2019 16:08:24 -0700
I. MapReduce basic principle
I. MapReduce overview
1, definition
Is a distributed computing programming framework. The core function is to integrate the business logic code written by the user and the default components into a complete distributed program, which runs on a hadoop cluster concurrently.
2. Advantages and disadvantages
(1) advantages1 > easy to program: wi ...
Posted by young_coder on Thu, 17 Oct 2019 18:18:22 -0700
Spark Structured Flow Processing Mechanism
fault tolerance
End-to-end assurance is one of the key goals of structured flow design.
Structured Streaming sources,sinks, etc. are designed to track the exact processing progress and allow it to restart or rerun to handle any failures.
streaming source is a kafka-like offsets to track the read location of the stream. The execution engine u ...
Posted by Acs on Thu, 10 Oct 2019 04:23:17 -0700
Spark streaming manually saves offset to zk java implementation
Article directory
Preface
pom Dependent Version
Dome
Preface
There are some cases on the Internet about setting offset manually in kafka, but most of them use 0.8 version of kafka, which is written by scala. The version of kafka-0.10 is rarely mentioned or incomplete. Version 0.10 is compatible ...
Posted by artin on Tue, 08 Oct 2019 21:41:57 -0700
Two methods of implementing real-time WordCount program by SparkStreaming and writing data into Mysql: using the tool netcat
The first few classes you need to understand
StreamingContext
How to read data
DStream
Processing data functions
There are many RDD s stored in DStream
PairDStreamFunctions
When the data type processed is a tuple,
Automatic implicit conversion of DStream to PairDStream Functions
RDD
Output fun ...
Posted by dale282 on Mon, 07 Oct 2019 02:29:52 -0700