Spark [Page 3] - Programmer Group - a programming skills sharing group

Spark

An anomaly detection technology based on automatic encoder built by keras for fraud identification

Credit card fraud can be classified as an exception and can be detected using an automatic encoder implemented in Keras I recently read an article called "using automatic encoder for anomaly detection", in which the generated data is tested, and I think it seems a good idea to apply the idea ...

Posted by Masterchief07 on Sun, 21 Jun 2020 19:57:59 -0700

Big data Hadoop cluster construction

Big data Hadoop cluster construction 1, Environment Server configuration: CPU model: Intel ® Xeon ® CPU E5-2620 v4 @ 2.10GHz CPU cores: 16 Memory: 64GB operating system Version: CentOS Linux release 7.5.1804 (Core) Host list: IP host name 192.168.1.101 node1 192.168.1.102 node2 1 ...

Posted by SL-Cowsrule on Sun, 21 Jun 2020 17:57:54 -0700

Storage of real-time data, Spark+TDengine application in China Telecom Power Dynamometer System Monitoring Platform

Small T guide: the monitoring platform of electric power dynamometer system is designed and developed based on the equipment data acquisition and equipment Bank V2.0 application of China Telecom Shanghai ideal information industry (Group) Co., Ltd. The real-time data of the collected devices are stored in TDengine. Application background The p ...

Posted by ainoy31 on Thu, 18 Jun 2020 20:14:38 -0700

Using ibis, impyla, pyhive and pyspark to connect to Hive and Impala of Kerberos security authentication in Python

There are many ways to connect hive and impala in python, including pyhive,impyla,pyspark,ibis, etc. in this article, we will introduce how to use these packages to connect hive or impala, and how to pass kerberos authentication. Kerberos If the cluster does not enable kerberos authentication, the ...

Posted by RunningUtes on Mon, 08 Jun 2020 23:22:07 -0700

spark structured streams, creating streaming DataFrame s and streaming Datasets

Create Streaming DataFrame and Streaming Datasets Streaming DataFrames are available through SparkSession.readStream() The returned DataStreamReader interface (Scala / Java / Python document) is created. Input Sources Common built-in Sources File source: Reads files from a specified directory as st ...

Posted by IwnfuM on Sun, 07 Jun 2020 18:13:27 -0700

Comparison of updateStateByKey and mapWithState

What is a state management function The state management functions in Spark Streaming, including updateStateByKey and mapWithState, are used to count changes in the state of the global key.They reduce by key with data from DStream, then accumulate data from each batch as new data information enters or updates.To keep users in whatever shape t ...

Posted by w00kie on Mon, 01 Jun 2020 20:53:52 -0700

Instances of conversion operations for the core DStream of Spark Streaming

Conversion operation of DStream The DStream API provides the following methods related to transformation operations: Examples of transform(func) and updateStateByKey(func) methods are given below: (1), transform(func) method transform methods and similar transformWith(func) methods allow any RDD-to-RDD function to be applied on DStream and can ...

Posted by buildakicker on Sat, 23 May 2020 12:15:37 -0700

The difference between sql join on and where

The difference between sql join on and where explain select count(1) from cellinfo_20171124 ci join shandong_lac_ci slc on concat(cast(ci.lac as string),",",cast(ci.ci as string)) =slc.lac_ci; | == Physical Plan == *HashAggregate(keys=[], functions=[count(1)]) +- Exchange SinglePartition +- *HashAggregate(keys=[ ...

Posted by lmaster on Sun, 03 May 2020 04:01:25 -0700

java implementation of spark streaming and kafka integration for flow computing

java implementation of spark streaming and kafka integration for flow computing Added on June 26, 2017: took over the search system, and in the past half year, I have gained a lot of new experience. I'm lazy to change this vulgar article. Let's take a look at this new blog to understand the following vulgar code, http://blog.csdn.net/yujishi2 ...

Posted by rednax on Thu, 30 Apr 2020 23:48:23 -0700

Lesson 02: Flink starter WordCount and SQL implementation

In this lesson, we mainly introduce the entry program of Flink and the implementation of SQL form. In the last lesson, we have explained Flink's common application scenarios and architecture model design. In this lesson, we will start from a simple WordCount case and implement it in SQL mode at the same time, laying a solid foundation for the l ...

Posted by scoppc on Tue, 28 Apr 2020 03:13:19 -0700

Hot Keywords