Spark [Page 11] - Programmer Group - a programming skills sharing group

Spark

Practice of Real-time Operation and Maintenance Technology of Public Security Big Data Based on Spark

Source of the article: https://www.iteblog.com/archives/1956.html There are tens of thousands of front-end and back-end equipments in the public security industry. Front-end equipments include cameras, detectors and sensors. Back-end equipments include servers, application servers, network equipments and power systems in central computer ...

Posted by Joseph07 on Sun, 24 Mar 2019 02:42:27 -0700

The basic RDD operators of Spark programming: join, right Outer Join, left Outer Join

1) join def join[W](other: RDD[(K, W)]): RDD[(K, (V, W))] def join[W](other: RDD[(K, W)], numPartitions: Int): RDD[(K, (V, W))] def join[W](other: RDD[(K, W)], partitioner: Partitioner): RDD[(K, (V, W))] 1 2 3 Make an internal connection according to the value of the same key pair to the rdd of the type. The value type retu ...

Posted by LucienFB on Wed, 06 Feb 2019 20:03:16 -0800

Running Principle of Spark Streaming

Sequence diagram 1. NetworkWordCount 2. Initialize StreamingContext 3. Create InputDStream 4. Start Job Scheduler 1. NetworkWordCount package yk.streaming import org.apache.spark.SparkConf import org.apache.spark.storage.StorageLevel import org.apache.spark.streaming.{Seconds, StreamingContext} object NetworkWordCount { ...

Posted by bsarika on Sun, 03 Feb 2019 15:03:16 -0800

Yarn tuning

1.Yarn Common Commands: [rachel@bigdata-senior01 bin]$ ./yarn Usage: yarn [--config confdir] COMMAND where COMMAND is one of: resourcemanager run the ResourceManager nodemanager run a nodemanager on each slave timelineserver run the timeline server rmadmin admin tools version ...

Posted by soloslinger on Sun, 03 Feb 2019 08:15:18 -0800

Spark textfile reads HDFS file partitions [compressed and uncompressed]

Spark textfile reads HDFS file partitions [compressed and uncompressed] sc.textFile("/blabla/{*.gz}") When we create spark context and use textfile to read files, what partition is it based on? What is the partition size? Compressed format of files File size and HDFS block size textfile will create a Hadoop RDD that uses ...

Posted by MK27 on Sat, 02 Feb 2019 17:54:15 -0800

Spark Learning Notes (1) - Introduction to Spark, Cluster Installation

1 Spark Introduction Spark is a fast, universal and scalable large data analysis engine. It was born in AMPLab, University of California, Berkeley in 2009. It was open source in 2010. It became Apache incubator in June 2013 and top-level Apache project in February 2014. At present, Spark ecosystem has developed into a collecti ...

Posted by All4172 on Sat, 02 Feb 2019 01:21:15 -0800

pyspark's Little Knowledge Points in Work

1. df.na.fill({field name 1':'default','field name 2':'default'}) replaces null values 2. df.dropDuplicaates() de-duplicate according to the field name, empty parameter for all fields 3. df.subtract(df1) returns the elements that appear in the current DF and do not appear in df1, and do not weigh. 4. print time.localtime([ti ...

Posted by nvidia on Fri, 01 Feb 2019 13:21:15 -0800

Spark SQL Notebook Arrangement (3): Load and Save Function and Spark SQL Function

Load and save function Data loading (json file, jdbc) and saving (json, jdbc) The test code is as follows: package cn.xpleaf.bigdata.spark.scala.sql.p1 import java.util.Properties import org.apache.log4j.{Level, Logger} import org.apache.spark.{SparkConf, SparkContext} import org.apache.spark.sql.{SQLContext, SaveMode} /* ...

Posted by danielrs1 on Fri, 01 Feb 2019 00:09:16 -0800

Spark Learning Notes (3) - Spark Operator

1 Spark Operator 1.1 is divided into two categories 1.1.1 Transformation Transformation delays execution, which records metadata information and actually starts computing when the computing task triggers the Action. 1.1.2 Action 1.2 Two Ways to Create RDD RDD is created through the file system supported by HDFS. The ...

Posted by gauravupadhyaya on Thu, 31 Jan 2019 22:39:16 -0800

One of the introductory cases of SparkSQL (SparkSQL 1.x)

SparkSQL 1.x and 2.x programming API s have some changes that are used in enterprises, so both of them will use cases to learn. The case of using Spark SQL 1.x first IDEA+Maven+Scala 1. pom dependencies for importing SparkSQL In the previous Blog Spark case, the following dependencies are added to the pom dependencies of th ...

Posted by fernado1283 on Thu, 31 Jan 2019 21:21:15 -0800

Hot Keywords