Development and Use of UDF by hive

Recently, there is a need for data mining, which requires statistics of the number of things given in the vicinity of longitude and latitude n kilometers. When it comes to calculating the distance between two points of the earth, UDF should be written to calculate the distance between two points of the earth. I. UDF Writing A ...

Posted by HairyScotsman on Sun, 20 Jan 2019 08:12:12 -0800

Flume data acquisition preparation

  Flume is a highly available, reliable and distributed system for collecting, aggregating and transferring massive logs provided by Cloudera. Flume supports customizing various data senders in the log system for collecting data. At the same time, Flume provides the ability to process data simply and write to various data rec ...

Posted by ron814 on Sat, 19 Jan 2019 07:21:13 -0800

Ubuntu 16.x server installed Java, Elastic search 5.4.X, Chinese word segmentation, synonyms, Logstash 5.4.X log collection

Environmental Science Ubuntu 16.x server Memory: minimum 8G lanmps environment Suite http://www.lanmps.com) PHP Version: 5.6 MYSQL Version: 5.6 NGINX Version: Latest Elastic search version: 5.4 Logstash version: 5.4 JAVA installation One way The Java version used here is 1.8.0_131 Install the Java version and download the corresp ...

Posted by TheBeginner on Sun, 06 Jan 2019 18:00:09 -0800

Java -- Locally submitting MapReduce jobs to clusters Implementing Word Count

Or that sentence, I always feel tired when I read what others have written. Once I paste the code, pack it up, throw it on Hadoop and run it all over???? Writing a test sample program (Hello World in MapReduce) is so troublesome!!!? I also typed the Jar package locally and passed it to Linux. Finally, I ran the jar package with the jar command ...

Posted by sunilj20 on Fri, 04 Jan 2019 10:33:10 -0800

Apache Ambari 01 - Ambari Mirror installation deployment

This paper derives from:Everyday Learning IT - Knowledge Base Cluster machine Host IP address 192.168.99.181 ambari-mirror 192.168.99.101 ambari-server 192.168.99.106 ambari-agent1 192.168.99.107 ambari-agent2 Create management user hadoop (all node operations) useradd hadoop Modify the machine hosts file (all node operations) ec ...

Posted by killerofet on Thu, 03 Jan 2019 08:39:10 -0800

Cloudera Manager HBase Thrift interface Go/Python client

background A recent requirement is to write a data query interface that stores the data in the Hadoop cluster HBase built by CDH. It has always been a firm Pythoner (actually lazy), but this year, after gradually contacting and experimenting with Go, I find it very appetizing. In addition to the company's disgusting operation and maintenance ...

Posted by supergrame on Sun, 23 Dec 2018 03:24:06 -0800

Linux Installs Spark Cluster (CentOS 7 + Spark 2.1.1 + Hadoop 2.8.0)

1 Install Spark-dependent Scala           1.1 Download and Decompress Scala           1.2 Configuring environment variables           1.3 Verify Scala 2 Download and Decompress Spark           2.1 Download Spark Compression Packet           2.2 Decompression Spark 3 Spark-related configuration           3.1 Configuring environment variable ...

Posted by micklerlop on Sat, 22 Dec 2018 02:21:06 -0800

HBase Day1 (Introduction to HBase, Environment Building, HBase shell command)

Why use HBase? Hbase is called Hadoop database. The design idea comes from the paper of bigtable (based on NoSQL database on GFS). HDFS supports the storage of massive data, does not support data modification (record level) and does not support immediate access to massive data. Generally, if you want to random read and write la ...

Posted by kanth1 on Fri, 21 Dec 2018 13:36:05 -0800