spark - the source of all things WordCount

N methods of implementing WordCount in Spark    Hello, everyone. I won't introduce myself here. Let's talk about WordCount, that is, word frequency. You may learn from various channels that data processing will bear the brunt of WordCount. Why? Because WordCount is simple. But it can well describe data processing and data stati ...

Posted by mash on Mon, 22 Nov 2021 11:21:45 -0800

Flink of big data

preface stay Flink of big data (Part I) In this paper, we introduce the characteristics, architecture, two-stage submission and data flow of Flink. This paper introduces the unique operator of Flink and the case of implementing WordCount with Flink 1, split and select operators The split operator splits a DataStream into two or more ...

Posted by hoogeebear on Mon, 22 Nov 2021 07:22:57 -0800

Running machine learning based on Tensorflow framework on Apache hadoop yarn

The links included in the development and application of machine learning introduce the method and process of running machine learning based on Tensorflow framework on apache hadoop yarn (Introduction to the links involved in development and application of machine learning, and the method and process of running machine learning based on ...

Posted by dan182skater on Sun, 21 Nov 2021 19:55:32 -0800

HanLP Chinese word segmentation, person name recognition and place name recognition

HanLP Chinese word segmentation, person name recognition and place name recognition Experimental purpose Download and install HanLP natural language processing package from the Internet;Familiar with the basic functions of HanLP natural language processing package;Using the information obtained by the web crawler, call the API of HanLP for Ch ...

Posted by Tjorriemorrie on Sun, 21 Nov 2021 18:00:35 -0800

GitLab CI/CD automated build and release practice

Process introduction CI/CD is a method to frequently deliver applications to customers by introducing automation in the application development phase. The core concepts of CI/CD are continuous integration, continuous delivery and continuous deployment. In this article, I will introduce the practice of automated build and release based on GitLa ...

Posted by SwarleyAUS on Sun, 21 Nov 2021 11:12:41 -0800

Elasticsearch dynamic template

1, Dynamic mapping When we first used ES, we probably didn't know much about mapping, and we didn't add mapping. Why can we add documents normally. That's because ES can dynamically map. When adding documents, fields that do not exist can be dynamically added to mapping. The following are some default mapping methods. |Value | if missing, add ...

Posted by cornelombaard on Sat, 20 Nov 2021 10:46:15 -0800

Example verification and thinking of Markowitz model (including Python code)

catalogue Brief introduction of Markowitz model 1, Principle 2, Correlation formula Example verification 1, Method selection 2, Train of thought ① Assumptions ② Set ③ Attention 3, Practice ① Data preparation ② The combination weight is calculated by Monte Carlo method ③ Select the effective frontier combination ④ Stock performan ...

Posted by dammitjanet on Wed, 17 Nov 2021 20:12:37 -0800

Hbase specific operation (illustrated and super complete ~ ~ ~)

Purpose: (1) Understand the role of HBase in Hadoop architecture. (2) Proficient in using HBase to operate common Shell commands. Objectives: (1) Be familiar with hbase related operations, and master the operations of creating tables, modifying tables, looking up tables, deleting tables, etc. (2) You can create a table by yourself, be familiar ...

Posted by djBuilder on Wed, 17 Nov 2021 08:45:22 -0800

Experiment 6 MapReduce data cleaning - meteorological data cleaning

Level 1: data cleaning Task description This task: clean the data according to certain rules. Programming requirements According to the prompt, add code in the editor on the right to clean the data according to certain rules. Data description is as follows: a.txt; Data segmentation method: one or more spaces; Data location: / user/test/ ...

Posted by zoobie on Fri, 12 Nov 2021 02:26:04 -0800

Palo Doris five minute quick start!

This article is reproduced from Baidu Developer Center https://developer.baidu.com/article/detail.html?id=294225In this tutorial section, I will introduce you to the operation process of using Palo UI to quickly experience and use Palo query.For public cloud users, please refer to the documentation to create a Palo cluster.Open source users nee ...

Posted by ratebuster on Thu, 11 Nov 2021 16:11:21 -0800