Development and Use of UDF by hive
Recently, there is a need for data mining, which requires statistics of the number of things given in the vicinity of longitude and latitude n kilometers. When it comes to calculating the distance between two points of the earth, UDF should be written to calculate the distance between two points of the earth.
I. UDF Writing
A ...
Posted by HairyScotsman on Sun, 20 Jan 2019 08:12:12 -0800
Flume data acquisition preparation
Flume is a highly available, reliable and distributed system for collecting, aggregating and transferring massive logs provided by Cloudera. Flume supports customizing various data senders in the log system for collecting data. At the same time, Flume provides the ability to process data simply and write to various data rec ...
Posted by ron814 on Sat, 19 Jan 2019 07:21:13 -0800
Ubuntu 16.x server installed Java, Elastic search 5.4.X, Chinese word segmentation, synonyms, Logstash 5.4.X log collection
Environmental Science
Ubuntu 16.x server
Memory: minimum 8G
lanmps environment Suite http://www.lanmps.com)
PHP Version: 5.6
MYSQL Version: 5.6
NGINX Version: Latest
Elastic search version: 5.4
Logstash version: 5.4
JAVA installation
One way
The Java version used here is 1.8.0_131
Install the Java version and download the corresp ...
Posted by TheBeginner on Sun, 06 Jan 2019 18:00:09 -0800
Java -- Locally submitting MapReduce jobs to clusters Implementing Word Count
Or that sentence, I always feel tired when I read what others have written. Once I paste the code, pack it up, throw it on Hadoop and run it all over???? Writing a test sample program (Hello World in MapReduce) is so troublesome!!!? I also typed the Jar package locally and passed it to Linux. Finally, I ran the jar package with the jar command ...
Posted by sunilj20 on Fri, 04 Jan 2019 10:33:10 -0800
Apache Ambari 01 - Ambari Mirror installation deployment
This paper derives from:Everyday Learning IT - Knowledge Base
Cluster machine
Host
IP address
192.168.99.181
ambari-mirror
192.168.99.101
ambari-server
192.168.99.106
ambari-agent1
192.168.99.107
ambari-agent2
Create management user hadoop (all node operations)
useradd hadoop
Modify the machine hosts file (all node operations)
ec ...
Posted by killerofet on Thu, 03 Jan 2019 08:39:10 -0800
Cloudera Manager HBase Thrift interface Go/Python client
background
A recent requirement is to write a data query interface that stores the data in the Hadoop cluster HBase built by CDH. It has always been a firm Pythoner (actually lazy), but this year, after gradually contacting and experimenting with Go, I find it very appetizing. In addition to the company's disgusting operation and maintenance ...
Posted by supergrame on Sun, 23 Dec 2018 03:24:06 -0800
Linux Installs Spark Cluster (CentOS 7 + Spark 2.1.1 + Hadoop 2.8.0)
1 Install Spark-dependent Scala
1.1 Download and Decompress Scala
1.2 Configuring environment variables
1.3 Verify Scala
2 Download and Decompress Spark
2.1 Download Spark Compression Packet
2.2 Decompression Spark
3 Spark-related configuration
3.1 Configuring environment variable ...
Posted by micklerlop on Sat, 22 Dec 2018 02:21:06 -0800
HBase Day1 (Introduction to HBase, Environment Building, HBase shell command)
Why use HBase?
Hbase is called Hadoop database. The design idea comes from the paper of bigtable (based on NoSQL database on GFS). HDFS supports the storage of massive data, does not support data modification (record level) and does not support immediate access to massive data. Generally, if you want to random read and write la ...
Posted by kanth1 on Fri, 21 Dec 2018 13:36:05 -0800