Spark [Page 7] - Programmer Group - a programming skills sharing group

Spark

Scala defines arrays, enhances for traversing Scala arrays, unitl generates subscripts traversing Scala arrays, array conversion, array common algorithms 05

1 fixed length array and variable length array Format of fixed length array definition: val arr = new Array[T] (array length) Variable length array definition format: val arr = ArrayBuffer[T]()Note that you need to import the package: import scala.collection.mutable.ArrayBuffer The code is as follows import scala.colle ...

Posted by sashi34u on Sat, 16 Nov 2019 07:37:04 -0800

2. Principle and Use of spark--spark core

[TOC] 1. Some basic terms in spark RDD: Elastically distributed datasets, the core focus of sparkOperators: Some functions for manipulating RDDapplication: user-written spark Program (DriverProgram + ExecutorProgram)job: an action class operator triggered operationstage: A set of tasks that divide a job into several stages based on dependencie ...

Posted by FeeBle on Fri, 15 Nov 2019 22:22:07 -0800

Troubleshooting Spark error -- Error initializing SparkContext

Spark reported an error when submitting the spark job ./spark-shell 19/05/14 05:37:40 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel) ...

Posted by motofzr1000 on Sun, 10 Nov 2019 08:02:46 -0800

Graphx processing janusGraph data implementation

Declarations: This scheme is an alternative in case the implementation of gremlinSQL scheme is blocked by spark's direct execution. It does not involve working secrets, there is no possibility of leaking secrets, it is purely personal reflection, and I hope to make a contribution Scheme: Converts the query result of gremlinSql to startGraph, th ...

Posted by lovelys on Thu, 07 Nov 2019 09:49:15 -0800

Spark SQL uses beeline to access hive warehouse

I. add hive-site.xml Add the hive-site.xml configuration file under $SPARK_HOME/conf in order to access hive metadata normally vim hive-site.xml <configuration> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://192.168.1.201:3306/hiveDB?createDatabaseIfNotExist=true ...

Posted by mrodrigues on Wed, 06 Nov 2019 14:06:19 -0800

The windowing function of Spark

I. Introduction The window function row ou number() is a function that groups one field and then takes the first values sorted by another field, which is equivalent to grouping topN. If a windowing function is used in the SQL statement, the SQL statement must be executed with HiveContext. II. Code practice [use HiveContext] package big.dat ...

Posted by nezbo on Fri, 01 Nov 2019 16:08:24 -0700

I. MapReduce basic principle

I. MapReduce overview 1, definition Is a distributed computing programming framework. The core function is to integrate the business logic code written by the user and the default components into a complete distributed program, which runs on a hadoop cluster concurrently. 2. Advantages and disadvantages (1) advantages1 > easy to program: wi ...

Posted by young_coder on Thu, 17 Oct 2019 18:18:22 -0700

Spark Structured Flow Processing Mechanism

fault tolerance End-to-end assurance is one of the key goals of structured flow design. Structured Streaming sources,sinks, etc. are designed to track the exact processing progress and allow it to restart or rerun to handle any failures. streaming source is a kafka-like offsets to track the read location of the stream. The execution engine u ...

Posted by Acs on Thu, 10 Oct 2019 04:23:17 -0700

Spark streaming manually saves offset to zk java implementation

Article directory Preface pom Dependent Version Dome Preface There are some cases on the Internet about setting offset manually in kafka, but most of them use 0.8 version of kafka, which is written by scala. The version of kafka-0.10 is rarely mentioned or incomplete. Version 0.10 is compatible ...

Posted by artin on Tue, 08 Oct 2019 21:41:57 -0700

Two methods of implementing real-time WordCount program by SparkStreaming and writing data into Mysql: using the tool netcat

The first few classes you need to understand StreamingContext How to read data DStream Processing data functions There are many RDD s stored in DStream PairDStreamFunctions When the data type processed is a tuple, Automatic implicit conversion of DStream to PairDStream Functions RDD Output fun ...

Posted by dale282 on Mon, 07 Oct 2019 02:29:52 -0700

Hot Keywords