The realization principle of window function in spark and hive

Window function is often used in work and often asked in interview. Do you know the implementation principle behind it? Starting from the problems encountered in a business, this paper discusses the data flow principle of window function in hsql, and gives a solution to this problem at the end of the article. ​   1, Business background Fi ...

Posted by moiseszaragoza on Mon, 06 Apr 2020 04:05:56 -0700

Security settings when building a cluster on Baidu cloud server

After moving the hadoop cluster on the local virtual machine to Baidu cloud server, I found that there are always many unknown ip addresses logging in to my server, because the firewall is closed locally, but in the actual deployment, this is too unsafe. So I spent two hours setting up the firewall of t ...

Posted by wonderman on Sun, 15 Mar 2020 02:23:32 -0700

Special symbols commonly used in Scala

1. = > anonymous function In Spark, a function is also an object that can be assigned to a variable. Format of Spark's anonymous function definition: ==(parameter list) = > {function body}== Therefore, the function of = > is to create an anonymous function instance. For example: (X: int) = > x + 1 2. < - (set traversal) Loop trav ...

Posted by ctimmer on Thu, 12 Mar 2020 04:54:52 -0700

Spark -- Transformation operator

Article directory Transformation operator Basic operator 1. map(func) 2. filter(func) 3. flatMap 4. Set operation (union, intersection, distinct) 5. Grouping (groupByKey, reduceByKey, cogroup) 6. Sorting (sortBy, sortByKey) Advanced operator 1. mapPartitionsWithIndex(func) 2. aggregate 3. aggreg ...

Posted by brashquido on Thu, 12 Mar 2020 01:07:43 -0700

Java programmer practical machine learning -- starting from clustering algorithm

This article is suitable for programmers with programming experience. It is a machine learning "Hello world!" People who don't have much theoretical knowledge should take a detour. Preface Artificial intelligence is undoubtedly one of the hottest technical topics in recent years. The artificial intelligence technology represented b ...

Posted by nic9 on Mon, 09 Mar 2020 02:51:39 -0700

Spark SQL dataframe, DataSet and RDD

Spark SQL directory DataFrame DataSet RDD DataFrame, conversion between DataSet and RDD DataFrame, relationship between DataSet and RDD The commonness and difference between DataFrame, DataSet and RDD 1.Spark SQL Spark SQL is a module used by spark to process structured data. It provides two progr ...

Posted by tracivia on Tue, 03 Mar 2020 19:25:20 -0800

Myspark startup process decryption

Original author: Li Haiqiang, from the retail big data team of Ping An Bank ​ Preface As a data engineer, you may encounter many ways to start PySpark. You may not understand what they have in common, what differences they have, and what impact different methods have on program development and deploymen ...

Posted by phant0m on Sat, 29 Feb 2020 23:24:31 -0800

NVIDIA rapids cuGraph model

The RAPIDS cuGraph library is a set of graph analysis used to process data in GPU data frames - see cuDF. cuGraph is designed to provide NetworkX like API s that are familiar to data scientists, so they can now build GPU accelerated workflows more easily Official documents:rapidsai/cugraphcuGraph API Re ...

Posted by florida_guy99 on Tue, 25 Feb 2020 06:53:28 -0800

RDD common operations in pyspark

preparation: import pyspark from pyspark import SparkContext from pyspark import SparkConf conf=SparkConf().setAppName("lg").setMaster('local[4]') #local[4] means to run 4 kernels locally sc=SparkContext.getOrCreate(conf) 1. Parallel and collect The parallelize function converts the list obj ...

Posted by moomsdad on Fri, 21 Feb 2020 02:13:19 -0800

Spark Streaming of big data technology

Spark Streaming of big data technology 1: Overview 1. Definition: Spark Streaming is used for streaming data processing. Spark Streaming supports many data input sources, such as Kafka, Flume, Twitter, ZeroMQ and simple TCP sockets. After data input, you can use Spark's highly abstract primitives such ...

Posted by croakingtoad on Mon, 10 Feb 2020 07:28:21 -0800