Practice of Real-time Operation and Maintenance Technology of Public Security Big Data Based on Spark
Source of the article: https://www.iteblog.com/archives/1956.html
There are tens of thousands of front-end and back-end equipments in the public security industry. Front-end equipments include cameras, detectors and sensors. Back-end equipments include servers, application servers, network equipments and power systems in central computer ...
Posted by Joseph07 on Sun, 24 Mar 2019 02:42:27 -0700
The basic RDD operators of Spark programming: join, right Outer Join, left Outer Join
1) join
def join[W](other: RDD[(K, W)]): RDD[(K, (V, W))]
def join[W](other: RDD[(K, W)], numPartitions: Int): RDD[(K, (V, W))]
def join[W](other: RDD[(K, W)], partitioner: Partitioner): RDD[(K, (V, W))]
1
2
3
Make an internal connection according to the value of the same key pair to the rdd of the type. The value type retu ...
Posted by LucienFB on Wed, 06 Feb 2019 20:03:16 -0800
Running Principle of Spark Streaming
Sequence diagram
1. NetworkWordCount
2. Initialize StreamingContext
3. Create InputDStream
4. Start Job Scheduler
1. NetworkWordCount
package yk.streaming
import org.apache.spark.SparkConf
import org.apache.spark.storage.StorageLevel
import org.apache.spark.streaming.{Seconds, StreamingContext}
object NetworkWordCount { ...
Posted by bsarika on Sun, 03 Feb 2019 15:03:16 -0800
Yarn tuning
1.Yarn Common Commands:
[rachel@bigdata-senior01 bin]$ ./yarn
Usage: yarn [--config confdir] COMMAND
where COMMAND is one of:
resourcemanager run the ResourceManager
nodemanager run a nodemanager on each slave
timelineserver run the timeline server
rmadmin admin tools
version ...
Posted by soloslinger on Sun, 03 Feb 2019 08:15:18 -0800
Spark textfile reads HDFS file partitions [compressed and uncompressed]
Spark textfile reads HDFS file partitions [compressed and uncompressed]
sc.textFile("/blabla/{*.gz}")
When we create spark context and use textfile to read files, what partition is it based on? What is the partition size?
Compressed format of files
File size and HDFS block size
textfile will create a Hadoop RDD that uses ...
Posted by MK27 on Sat, 02 Feb 2019 17:54:15 -0800
Spark Learning Notes (1) - Introduction to Spark, Cluster Installation
1 Spark Introduction
Spark is a fast, universal and scalable large data analysis engine. It was born in AMPLab, University of California, Berkeley in 2009. It was open source in 2010. It became Apache incubator in June 2013 and top-level Apache project in February 2014. At present, Spark ecosystem has developed into a collecti ...
Posted by All4172 on Sat, 02 Feb 2019 01:21:15 -0800
pyspark's Little Knowledge Points in Work
1. df.na.fill({field name 1':'default','field name 2':'default'}) replaces null values
2. df.dropDuplicaates() de-duplicate according to the field name, empty parameter for all fields
3. df.subtract(df1) returns the elements that appear in the current DF and do not appear in df1, and do not weigh.
4. print time.localtime([ti ...
Posted by nvidia on Fri, 01 Feb 2019 13:21:15 -0800
Spark SQL Notebook Arrangement (3): Load and Save Function and Spark SQL Function
Load and save function
Data loading (json file, jdbc) and saving (json, jdbc)
The test code is as follows:
package cn.xpleaf.bigdata.spark.scala.sql.p1
import java.util.Properties
import org.apache.log4j.{Level, Logger}
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql.{SQLContext, SaveMode}
/* ...
Posted by danielrs1 on Fri, 01 Feb 2019 00:09:16 -0800
Spark Learning Notes (3) - Spark Operator
1 Spark Operator
1.1 is divided into two categories
1.1.1 Transformation
Transformation delays execution, which records metadata information and actually starts computing when the computing task triggers the Action.
1.1.2 Action
1.2 Two Ways to Create RDD
RDD is created through the file system supported by HDFS. The ...
Posted by gauravupadhyaya on Thu, 31 Jan 2019 22:39:16 -0800
One of the introductory cases of SparkSQL (SparkSQL 1.x)
SparkSQL 1.x and 2.x programming API s have some changes that are used in enterprises, so both of them will use cases to learn.
The case of using Spark SQL 1.x first
IDEA+Maven+Scala
1. pom dependencies for importing SparkSQL
In the previous Blog Spark case, the following dependencies are added to the pom dependencies of th ...
Posted by fernado1283 on Thu, 31 Jan 2019 21:21:15 -0800