Spark Streaming of big data technology

Spark Streaming of big data technology 1: Overview 1. Definition: Spark Streaming is used for streaming data processing. Spark Streaming supports many data input sources, such as Kafka, Flume, Twitter, ZeroMQ and simple TCP sockets. After data input, you can use Spark's highly abstract primitives such ...

Posted by croakingtoad on Mon, 10 Feb 2020 07:28:21 -0800

How to quickly build a Spark distributed architecture for big data

Build our Spark platform from scratch 1. Preparing the centeros environment In order to build a real cluster environment and achieve a highly available architecture, we should prepare at least three virtual machines as cluster nodes. So I bought three Alibaba cloud servers as our cluster nodes. ...

Posted by knelson on Tue, 04 Feb 2020 23:47:57 -0800

Find the number of adjacent words in large amount of data

This topic is similar to some of the search topics in Leetcode. The problem you want to deal with is: count the number of two adjacent digits of a word. If there are w1,w2,w3,w4,w5,w6, then: The final output is (word,neighbor,frequency). We implement it in five ways: MapReduce Spark Spark SQL method Scala method Spark SQL for Scala MapReduce ...

Posted by olechka on Sun, 02 Feb 2020 08:18:59 -0800

Spark SQL/DataFrame/DataSet operation ----- read data

1, Read data source (1) Read json and use spark.read. Note: the path is from HDFS by default. If you want to read the native file, you need to prefix it file: / /, as follows scala> val people = spark.read.format("json").load("file:///opt/software/data/people.json") people: org.apache.spark.sql.DataFrame = [age: bigint, name: string] scal ...

Posted by Pie on Sun, 02 Feb 2020 08:18:33 -0800

BT seed structure and coding analysis

1, Data type There are four types of data: string, integer, list and dictionary. Strings (string) Code to: < string len gt h >: < string > For example: 4:test is represented by the string "test" 4: Examples are represented as string "examples" String length in bytes ...

Posted by inkdrop on Sat, 01 Feb 2020 05:03:09 -0800

The trap of Broadcast Join in SparkSql 2.x (hint does not work)

Problem description Use hint to specify the broadcast table, but cannot perform the specified broadcast; preparation in advance hive> select * from test.tmp_demo_small; OK tmp_demo_small.pas_phone tmp_demo_small.age 156 20 157 22 158 15 hive> analyze table test.tmp_demo_small compute statis ...

Posted by cbullock on Fri, 17 Jan 2020 06:02:22 -0800

Flink of big data learning

Catalog   1: Introduction 2: Why Flink 3: What industries need 4: Features of Flink 5: The difference with sparkStreaming 6: Preliminary development 7: Flink configuration description Eight: Environment 9: Running components 1: Introduction Flink is a framework and distributed com ...

Posted by stodge on Fri, 17 Jan 2020 01:18:24 -0800

Hadoop Part 2: mapreedce

Mapreedce (3) Project address: https://github.com/KingBobTitan/hadoop.git MR's Shuffle explanation and Join implementation First, review 1. MapReduce's history monitoring service: JobHistoryServer Function: used to monitor the information of all MapReduce programs running on YARN Configure log ...

Posted by nick1 on Tue, 14 Jan 2020 02:21:13 -0800

waterdrop filtering processing log files

waterdrop filters and processes log files to store data Installing waterdrop Download the installation package of waterdrop using wget wget xxxxx Extract to the directory you need Unzip XXX (package location) XXX (decompression location) If unzip reports an error, please download the corresponding command yourself. Set the dependency env ...

Posted by PhantomCube on Mon, 13 Jan 2020 01:04:18 -0800

Group control of several common window functions in Hive

brief introduction Of course, there is nothing to say about regular window functions. It's very simple. Here's an introduction to grouping, focusing on the usage of rows between after grouping and sorting. The key is to understand the meaning of keywords in rows between: Keyword Meaning preceding Forward following In the future current ...

Posted by skyxmen on Thu, 09 Jan 2020 07:26:16 -0800