Big Data - Programmer Group - a programming skills sharing group

Big Data

Flink sink Elasticsearch prevents task interruption

preface The Flink real-time computing platform has been built since half a year. Elasticsearch has been used in some storage layers and contacted Flink from scratch. In the past half a year, many pits have been encountered, which has changed from traditional development to big data development. Elasticsearch contains a variety of fuses to prev ...

Posted by banzaimonkey on Mon, 06 Dec 2021 15:50:31 -0800

Pyspark machine learning library ml learning notes: Breast Cancer Wisconsin (Diagnostic) Data Set

Data Attribute information: 1) ID number 2) Diagnosis (M = malignant, B = benign) Calculate 10 real value characteristics of each nucleus: a) Radius (average distance from center to perimeter) b) Texture (standard deviation of gray value) c) Perimeter d) Area e) Smoothness (local variation of radius length) f) Compactness (perimeter ^ ...

Posted by ajlisowski on Sun, 05 Dec 2021 02:35:38 -0800

[practice of association rules in data mining] Intelligent Recommendation Algorithm of association rules

Data description Data parameters OrderNumber: customer nickname LineNumber: purchase order. For example, the first three lines respectively represent three goods purchased by the same customer Model: trade name Problem description Application of intelligent algorithm recommendation of association rules based on shopping basket. Three basic ...

Posted by Skara on Sat, 04 Dec 2021 22:41:39 -0800

Experiment 8 project case - e-commerce data analysis

Level 1: Statistics of user churn Task description This task: according to the user behavior data, write MapReduce program to count the loss of users. Relevant knowledge This training is an intermediate difficulty MapReduce programming exercise, which simulates the statistical analysis of e-commerce data in real scenes. Therefore, it is ...

Posted by kane007 on Sat, 04 Dec 2021 19:59:57 -0800

The CDH6.1 installation and deployment document comes with the installation package (three nodes as an example)

1, Installation package download Link: https://pan.baidu.com/s/1G6V9u5PDyxlixZ2PwGWdJA Extraction code: q8mb Note: the above installation package is a zip package, which contains all the packages for installing CDH6.1. After downloading, unzip it 2, Upload the installation package to the master node Note: the installation directory here sh ...

Posted by HIV on Fri, 03 Dec 2021 07:08:42 -0800

Construction of 3 data warehouses in the actual combat of shangsilicon Valley data warehouse

@ Warehouse notes Detailed explanation of data warehouse and data mart: ODS, DW, DWD, DWM, DWS, ADS Project requirements and architecture design of shangsilicon Valley data warehouse Shang Silicon Valley data warehouse: 2 data warehouse layering + dimension modeling Construction of 3 data warehouses in the actual combat of shangsilicon Valley ...

Posted by mogster on Fri, 03 Dec 2021 05:27:35 -0800

A summary of the use of clickhouse

It is said that clickhouse is a columnar storage database used in olap scenarios with a large amount of data. Fortunately, it can also be used in actual scenarios. Let's talk about the simple experience of using this article. 1. Overall description Not much about architecture, column storage, large amount of data and high performance. See off ...

Posted by arlabbafi on Fri, 03 Dec 2021 00:26:30 -0800

Big data learning tutorial SD version - Part 1 [shell]

1.shell Shell command line interpreter, Linux scripting language 1.1 variables Common system variables: $HOME $PWD $SHELL $USERStrict space rules There must be no spaces on either side of the equal signThere are spaces in the variable. You can wrap it with "" or ()There must be a space between expr operatorsThere should b ...

Posted by bumbar on Wed, 01 Dec 2021 13:51:37 -0800

Hadoop yarn source code analysis AsyncDispatcher event asynchronous distributor 2021SC@SDUSC

2021SC@SDUSC 1, AsyncDispatcher overview As an asynchronous event scheduler in yarn, AsyncDispatcher is a component of scheduling events based on blocking queue in RM. It dispatches events in a specific single thread and sends the dispatched events to the corresponding EventHandler event processor registered in AsyncDispatcher for processi ...

Posted by cyberrate on Wed, 01 Dec 2021 07:10:26 -0800

Construction of 3 digital warehouses in shangsilicon Valley

@Warehouse notesDetailed explanation of data warehouse and data mart: ODS, DW, DWD, DWM, DWS, ADSProject requirements and architecture design of shangsilicon Valley digital warehouse practice 1Shang Silicon Valley digital warehouse actual combat 2 digital warehouse layered + dimensional modelingConstruction of 3 digital warehouses in shangsilic ...

Posted by nou on Wed, 01 Dec 2021 05:25:31 -0800

Hot Keywords