Hadoop Part 2: mapreedce

Mapreedce (3) Project address: https://github.com/KingBobTitan/hadoop.git MR's Shuffle explanation and Join implementation First, review 1. MapReduce's history monitoring service: JobHistoryServer Function: used to monitor the information of all MapReduce programs running on YARN Configure log ...

Posted by nick1 on Tue, 14 Jan 2020 02:21:13 -0800

waterdrop filtering processing log files

waterdrop filters and processes log files to store data Installing waterdrop Download the installation package of waterdrop using wget wget xxxxx Extract to the directory you need Unzip XXX (package location) XXX (decompression location) If unzip reports an error, please download the corresponding command yourself. Set the dependency env ...

Posted by PhantomCube on Mon, 13 Jan 2020 01:04:18 -0800

Group control of several common window functions in Hive

brief introduction Of course, there is nothing to say about regular window functions. It's very simple. Here's an introduction to grouping, focusing on the usage of rows between after grouping and sorting. The key is to understand the meaning of keywords in rows between: Keyword Meaning preceding Forward following In the future current ...

Posted by skyxmen on Thu, 09 Jan 2020 07:26:16 -0800

The way of Hadoop learning Mapreduce program completes wordcount

Test text data used by the program: Dear River Dear River Bear Spark Car Dear Car Bear Car Dear Car River Car Spark Spark Dear Spark 1 main categories (1) Maper class The first is the custom Maper class code public class WordCountMap extends Mapper<LongWritable, Text, Text, IntWritable> { public void map(LongWritable key, Text val ...

Posted by SundayDriver on Fri, 27 Dec 2019 02:19:33 -0800

Scala functional programming under functional data structure

previously on Guide to functional programming in Scala Scala functional programming (2) introduction to scala basic syntax scala functional programming (3) scala sets and functions Scala functional programming (four) functional data structure 1.List code analysis The content introduced today is mainly to supplement the scala functional data st ...

Posted by stev979 on Thu, 19 Dec 2019 03:45:09 -0800

Real time log analysis by Flume+Kafka+SparkStreaming+Redis+Mysql ip access times

Novice learning, if there are mistakes, please correct, thank you! 1. Start zookeeper and kafka, and set a topic as test fkss. For the convenience of observation, I added it through kafka manager 2. Configure Flume and start it. The listening file is / home / czh / docker-public-file/testplume.log, which is sent to kafka a ...

Posted by ksduded on Sat, 14 Dec 2019 10:51:38 -0800

Using docker to install Hadoop and Spark

Using docker configuration to install hadoop and spark Install hadoop and spark images respectively Install hadoop image docker selected Mirror Address , the version of hadoop provided by this image is relatively new, and jdk8 is installed, which can support the installation of the latest version of spark. docker pull uhopp ...

Posted by shantred on Tue, 10 Dec 2019 22:46:53 -0800

spark reads hive data java

Requirement: read out the data in hive and write it into es. Environment: spark 2.0.2 1. enableHiveSupport() is set in sparksession SparkConf conf = new SparkConf().setAppName("appName").setMaster("local[*]"); SparkSession spark = SparkSession .builder() .appName("Java Spark SQL basic exam ...

Posted by NSW42 on Tue, 10 Dec 2019 14:30:32 -0800

Spark - upgraded data source JDBC 2

In the data source of spark, only Append, Overwrite, ErrorIfExists, Ignore are supported. But almost all of our online businesses need the upsert function, that is, the existing data must not be overwritten. In mysql, we use: ON DUPLICATE KEY UPDATE. Is there such an implementation? Official: sorry, no, dounine: I have it. You can ...

Posted by smordue on Fri, 22 Nov 2019 07:03:24 -0800

Data service analysis of Spark project

Business logic processing Peers To judge whether it is a peer object, we can judge whether two objects have passed through multiple identical places by using longitude and latitude. Of course, each monitoring device can also be marked. When an object passes through the monitoring device, it can be captured by the device. Tra ...

Posted by aztec on Sun, 17 Nov 2019 08:26:39 -0800