MapReduce in Python and running in Hadoop environment

Catalog Zero, code Xian inspirational I. running in Linux II. Running in Hadoop environment Zero, code Xian inspirational I. running in Linux First, create the following directory in Linux, do not put anything in it, and then enter the directory /home/hadoopuser/mydoc/py Then create a ddd.txt file in it Write the foll ...

Posted by Robban on Sun, 03 Nov 2019 06:46:55 -0800

Pit that Hive stepped on during use

Error 1 when hive starts Cannot execute statement:impossible to write to binary long since BINLOG_FORMAT = STATEMENT... //Error when starting Caused by: javax.jdo.JDOException:Couldnt obtain a new sequence(unique id):Cannot execute statement:impossible to write to binary log since BINLOG_FORMAT = STATEMENT and at least one table uses a storage ...

Posted by FortMyersDrew on Mon, 28 Oct 2019 14:17:31 -0700

Flink development issues summary

When installing through brew on mac system, the local default installation address is / usr / local / cell / Apache Flink / 1.5.1 1. Can I call graph algorithm in flink? It can be called in dataSet, and it needs to write its own method in dataStream. It can be implemented in scala by itself, just by r ...

Posted by Rommeo on Sun, 27 Oct 2019 03:13:17 -0700

Get active nn and replace hue.ini

namenodelists="nnip1,nnip2" nn1=$(echo $namenodelists | cut -d "," -f 1) nn2=$(echo $namenodelists | cut -d "," -f 2) nn1state=$(curl "http://$nn1:50070/jmx?qry=Hadoop:service=NameNode,name=NameNodeStatus"|grep -c active) nn2state=$(curl "http://$nn2:50070/jmx?qry=Hadoop:service=NameNode,name=NameNodeStatus"|grep -c active) source /etc ...

Posted by Padgoi on Sat, 19 Oct 2019 11:11:32 -0700

I. MapReduce basic principle

I. MapReduce overview 1, definition Is a distributed computing programming framework. The core function is to integrate the business logic code written by the user and the default components into a complete distributed program, which runs on a hadoop cluster concurrently. 2. Advantages and disadvantages (1) advantages1 > easy to program: wi ...

Posted by young_coder on Thu, 17 Oct 2019 18:18:22 -0700

Add LZO compression support for Hadoop

The compression mode with lzo enabled is very useful for small-scale clusters. The compression ratio can be reduced to about 1 / 3 of the original log size. At the same time, the speed of decompression is faster. install Prepare jar package 1) Download lzo's jar project firsthttps://github.com/twitter/hadoop-lzo/archive/master.zip 2) the na ...

Posted by fastfingertips on Tue, 15 Oct 2019 10:36:22 -0700

Hadoop environment deployment document

I. Deployment environment Local IP: 192.168.0.222 System: CentOS Linux release 7.6.1810 (Core) Kernel: 3.10.0-957.el7.x86_64 II. Installation of docker-ce yum install -y yum-utils device-mapper-persistent-data lvm2 && yum-config-manager --add-repo https://download.docker.com/linux/centos/doc ...

Posted by wannasub on Fri, 11 Oct 2019 12:32:49 -0700

MapReduce programming in detail

Writing MapReduce Program Writing wordcount Program Scenario: There are a lot of files in which words are stored and one word occupies one line. Task: How to count the number of occurrences of each word Similar application scenarios: Statistics of the most popular K search terms in search engines Stati ...

Posted by Avochelm on Tue, 08 Oct 2019 16:02:03 -0700

Inverted Index for MapReduce Programming Development

Inverted index is a variant of word frequency statistics, in fact, it is also a word frequency statistics, but this word frequency statistics need to add the name of the file. Inverted index is widely used for full-text retrieval. The final result of the inverted index is a collection of the number o ...

Posted by cmaclennan on Mon, 07 Oct 2019 18:50:03 -0700

Spark implements a slightly more complex business scenario by customizing InputFormat to read HDFS files

Links to the original text: https://www.oipapio.com/cn/article-2689341 Business scenario Spark knows how to read files according to InputFormat. By default, it reads files according to one line. In some specific cases, Spark's default Inpu ...

Posted by garyb_44 on Sun, 06 Oct 2019 17:05:10 -0700