spark - the source of all things WordCount
N methods of implementing WordCount in Spark
Hello, everyone. I won't introduce myself here. Let's talk about WordCount, that is, word frequency. You may learn from various channels that data processing will bear the brunt of WordCount. Why? Because WordCount is simple. But it can well describe data processing and data stati ...
Posted by mash on Mon, 22 Nov 2021 11:21:45 -0800
Flink of big data
preface
stay Flink of big data (Part I) In this paper, we introduce the characteristics, architecture, two-stage submission and data flow of Flink. This paper introduces the unique operator of Flink and the case of implementing WordCount with Flink
1, split and select operators
The split operator splits a DataStream into two or more ...
Posted by hoogeebear on Mon, 22 Nov 2021 07:22:57 -0800
Running machine learning based on Tensorflow framework on Apache hadoop yarn
The links included in the development and application of machine learning introduce the method and process of running machine learning based on Tensorflow framework on apache hadoop yarn (Introduction to the links involved in development and application of machine learning, and the method and process of running machine learning based on ...
Posted by dan182skater on Sun, 21 Nov 2021 19:55:32 -0800
HanLP Chinese word segmentation, person name recognition and place name recognition
HanLP Chinese word segmentation, person name recognition and place name recognition
Experimental purpose
Download and install HanLP natural language processing package from the Internet;Familiar with the basic functions of HanLP natural language processing package;Using the information obtained by the web crawler, call the API of HanLP for Ch ...
Posted by Tjorriemorrie on Sun, 21 Nov 2021 18:00:35 -0800
GitLab CI/CD automated build and release practice
Process introduction
CI/CD is a method to frequently deliver applications to customers by introducing automation in the application development phase. The core concepts of CI/CD are continuous integration, continuous delivery and continuous deployment. In this article, I will introduce the practice of automated build and release based on GitLa ...
Posted by SwarleyAUS on Sun, 21 Nov 2021 11:12:41 -0800
Elasticsearch dynamic template
1, Dynamic mapping
When we first used ES, we probably didn't know much about mapping, and we didn't add mapping. Why can we add documents normally. That's because ES can dynamically map. When adding documents, fields that do not exist can be dynamically added to mapping. The following are some default mapping methods.
|Value | if missing, add ...
Posted by cornelombaard on Sat, 20 Nov 2021 10:46:15 -0800
Example verification and thinking of Markowitz model (including Python code)
catalogue
Brief introduction of Markowitz model
1, Principle
2, Correlation formula
Example verification
1, Method selection
2, Train of thought
① Assumptions
② Set
③ Attention
3, Practice
① Data preparation
② The combination weight is calculated by Monte Carlo method
③ Select the effective frontier combination
④ Stock performan ...
Posted by dammitjanet on Wed, 17 Nov 2021 20:12:37 -0800
Hbase specific operation (illustrated and super complete ~ ~ ~)
Purpose: (1) Understand the role of HBase in Hadoop architecture. (2) Proficient in using HBase to operate common Shell commands. Objectives: (1) Be familiar with hbase related operations, and master the operations of creating tables, modifying tables, looking up tables, deleting tables, etc. (2) You can create a table by yourself, be familiar ...
Posted by djBuilder on Wed, 17 Nov 2021 08:45:22 -0800
Experiment 6 MapReduce data cleaning - meteorological data cleaning
Level 1: data cleaning
Task description
This task: clean the data according to certain rules.
Programming requirements
According to the prompt, add code in the editor on the right to clean the data according to certain rules. Data description is as follows: a.txt; Data segmentation method: one or more spaces; Data location: / user/test/ ...
Posted by zoobie on Fri, 12 Nov 2021 02:26:04 -0800
Palo Doris five minute quick start!
This article is reproduced from Baidu Developer Center https://developer.baidu.com/article/detail.html?id=294225In this tutorial section, I will introduce you to the operation process of using Palo UI to quickly experience and use Palo query.For public cloud users, please refer to the documentation to create a Palo cluster.Open source users nee ...
Posted by ratebuster on Thu, 11 Nov 2021 16:11:21 -0800