MapReduce in Python and running in Hadoop environment
Catalog
Zero, code Xian inspirational
I. running in Linux
II. Running in Hadoop environment
Zero, code Xian inspirational
I. running in Linux
First, create the following directory in Linux, do not put anything in it, and then enter the directory
/home/hadoopuser/mydoc/py
Then create a ddd.txt file in it
Write the foll ...
Posted by Robban on Sun, 03 Nov 2019 06:46:55 -0800
Pit that Hive stepped on during use
Error 1 when hive starts
Cannot execute statement:impossible to write to binary long since BINLOG_FORMAT = STATEMENT...
//Error when starting
Caused by: javax.jdo.JDOException:Couldnt obtain a new sequence(unique id):Cannot execute statement:impossible to write to binary log since BINLOG_FORMAT = STATEMENT and at least one table uses a storage ...
Posted by FortMyersDrew on Mon, 28 Oct 2019 14:17:31 -0700
Flink development issues summary
When installing through brew on mac system, the local default installation address is / usr / local / cell / Apache Flink / 1.5.1
1. Can I call graph algorithm in flink?
It can be called in dataSet, and it needs to write its own method in dataStream. It can be implemented in scala by itself, just by r ...
Posted by Rommeo on Sun, 27 Oct 2019 03:13:17 -0700
Get active nn and replace hue.ini
namenodelists="nnip1,nnip2"
nn1=$(echo $namenodelists | cut -d "," -f 1)
nn2=$(echo $namenodelists | cut -d "," -f 2)
nn1state=$(curl "http://$nn1:50070/jmx?qry=Hadoop:service=NameNode,name=NameNodeStatus"|grep -c active)
nn2state=$(curl "http://$nn2:50070/jmx?qry=Hadoop:service=NameNode,name=NameNodeStatus"|grep -c active)
source /etc ...
Posted by Padgoi on Sat, 19 Oct 2019 11:11:32 -0700
I. MapReduce basic principle
I. MapReduce overview
1, definition
Is a distributed computing programming framework. The core function is to integrate the business logic code written by the user and the default components into a complete distributed program, which runs on a hadoop cluster concurrently.
2. Advantages and disadvantages
(1) advantages1 > easy to program: wi ...
Posted by young_coder on Thu, 17 Oct 2019 18:18:22 -0700
Add LZO compression support for Hadoop
The compression mode with lzo enabled is very useful for small-scale clusters. The compression ratio can be reduced to about 1 / 3 of the original log size. At the same time, the speed of decompression is faster.
install
Prepare jar package
1) Download lzo's jar project firsthttps://github.com/twitter/hadoop-lzo/archive/master.zip
2) the na ...
Posted by fastfingertips on Tue, 15 Oct 2019 10:36:22 -0700
Hadoop environment deployment document
I. Deployment environment
Local IP: 192.168.0.222
System: CentOS Linux release 7.6.1810 (Core)
Kernel: 3.10.0-957.el7.x86_64
II. Installation of docker-ce
yum install -y yum-utils device-mapper-persistent-data lvm2 && yum-config-manager --add-repo https://download.docker.com/linux/centos/doc ...
Posted by wannasub on Fri, 11 Oct 2019 12:32:49 -0700
MapReduce programming in detail
Writing MapReduce Program
Writing wordcount Program
Scenario: There are a lot of files in which words are stored and one word occupies one line.
Task: How to count the number of occurrences of each word
Similar application scenarios:
Statistics of the most popular K search terms in search engines
Stati ...
Posted by Avochelm on Tue, 08 Oct 2019 16:02:03 -0700
Inverted Index for MapReduce Programming Development
Inverted index is a variant of word frequency statistics, in fact, it is also a word frequency statistics, but this word frequency statistics need to add the name of the file. Inverted index is widely used for full-text retrieval. The final result of the inverted index is a collection of the number o ...
Posted by cmaclennan on Mon, 07 Oct 2019 18:50:03 -0700
Spark implements a slightly more complex business scenario by customizing InputFormat to read HDFS files
Links to the original text: https://www.oipapio.com/cn/article-2689341
Business scenario
Spark knows how to read files according to InputFormat. By default, it reads files according to one line. In some specific cases, Spark's default Inpu ...
Posted by garyb_44 on Sun, 06 Oct 2019 17:05:10 -0700