The compiled source of hive supports UDF functions

Keywords: Hadoop hive Java Apache

Article Directory


Friendly Tip: To reduce the probability of maven compilation errors on the server, you can choose to open the source locally using idea, modify the source and compile it (idea problems may eventually fail), but this ensures that the code we modify must be correct and that the local repository contains almost all the jar s.Then pack the local warehouse upload service and the modified code to replace the unzipped class on the server.

1. Download the source code

My Hadoop version environment uses hadoop-2.6.0-cdh5.7.0, so I download the corresponding hive version source from the CDH component repository.

2. Compile support for UDF

Compiled using maven.Maven's installation and configuration have been omitted here. I want to compile hive directly with idea, but trying to compile a class for a few hours will still cause errors and eventually compile the class with mvn on an honest server.

2.1 Upload and Unzip
#upload
[hadoop@hadoop001 source]$ rz
[hadoop@hadoop001 source]$ ll
-rw-r--r--.  1 hadoop hadoop 14652104 Apr 18  2019 hive-1.1.0-cdh5.7.0-src.tar.gz

#decompression
[hadoop@hadoop001 source]$ tar -zxvf hive-1.1.0-cdh5.7.0-src.tar.gz -C ~/source/
2.2 Add UDF Function Class

HelloUDF.java is a UDF function class that I have written before. You can refer to the previous article for UDF programming details. Blog , add the class to the hive-1.1.0-cdh5.7.0\ql\src\java\org\apache\hadoop\hive\ql\udf directory, and note that you modify the package name in the class.

#Add previously written HelloUDF classes
[hadoop@hadoop001 udf]$ ll
total 344
-rw-r--r--. 1 hadoop hadoop   567 Apr 20  2019 AddPre.java
drwxrwxr-x. 2 hadoop hadoop 12288 Mar 24  2016 generic
-rw-r--r--. 1 hadoop hadoop   409 Apr 20  2019 HelloUDF.java
drwxrwxr-x. 2 hadoop hadoop  4096 Mar 24  2016 ptf
-rw-r--r--. 1 hadoop hadoop   649 Apr 20  2019 RemovePre.java

#Modify the package name of HelloUDF.java to org.apache.hadoop.hive.ql.udf
[hadoop@hadoop001 udf]$ cd ~/source/hive-1.1.0-cdh5.7.0/ql/src/java/org/apache/hadoop/hive/ql/udf/
[hadoop@hadoop001 udf]$ vim HelloUDF.java 
package org.apache.hadoop.hive.ql.udf;

2.3 Registration Functions

Modify the FunctionRegistry.java class to register a function called say_hell0

[hadoop@hadoop001 exec]$ cd ~/source/hive-1.1.0-cdh5.7.0/ql/src/java/org/apache/hadoop/hive/ql/exec/
#Add registration information in the static code block
[hadoop@hadoop001 exec]$ vim FunctionRegistry.java 
    system.registerUDF("say_hell0", HelloUDF.class,false);

2.4 Compile hive

mvn clean package -DskipTests -Phadoop-2 -Pdist

[hadoop@hadoop001 ~]$ cd ~/source/hive-1.1.0-cdh5.7.0

#Compile, the first compilation takes a long time and requires patience until succfule occurs
[hadoop@hadoop001 hive-1.1.0-cdh5.7.0]$ mvn clean package -DskipTests -Phadoop-2 -Pdist

#Looking at the compiled package, apache-hive-1.1.0-cdh5.7.0-bin.tar.gz is what I need
[hadoop@hadoop001 target]$ cd ~/source/hive-1.1.0-cdh5.7.0/packaging/target/
[hadoop@hadoop001 target]$ ll 
total 129092
drwxrwxr-x. 2 hadoop hadoop      4096 Apr 15 10:19 antrun
drwxrwxr-x. 3 hadoop hadoop      4096 Apr 15 10:20 apache-hive-1.1.0-cdh5.7.0-bin
-rw-rw-r--. 1 hadoop hadoop 105725582 Apr 15 10:21 apache-hive-1.1.0-cdh5.7.0-bin.tar.gz
-rw-rw-r--. 1 hadoop hadoop  12610961 Apr 15 10:21 apache-hive-1.1.0-cdh5.7.0-jdbc.jar
-rw-rw-r--. 1 hadoop hadoop  13826134 Apr 15 10:21 apache-hive-1.1.0-cdh5.7.0-src.tar.gz
drwxrwxr-x. 2 hadoop hadoop      4096 Apr 15 10:20 archive-tmp
drwxrwxr-x. 3 hadoop hadoop      4096 Apr 15 10:19 maven-shared-archive-resources
drwxrwxr-x. 3 hadoop hadoop      4096 Apr 15 10:19 tmp
drwxrwxr-x. 2 hadoop hadoop      4096 Apr 15 10:19 warehouse


3. Deployment and installation

Omit, please refer to Hive Quick Start and Installation Deployment
The deployment process takes care not to conflict with an installed hive, especially if you execute a hive command that runs an actual previous hive component.

4. Testing UDF

#View Functions 
hive> show functions;
//At this point, we can see a function named say_hell0, which implements the use of custom functions without adding jar packages
#Run Function
hive> select say_hell0("666");
OK
hello:666
Time taken: 0.888 seconds, Fetched: 1 row(s)
hive> select rm_pre("3_wsk");
OK
wsk
Time taken: 0.183 seconds, Fetched: 1 row(s)

This, combined with previous UDF programming, goes through the whole process from UDF programming to function creation and source code compilation.

Posted by alant on Thu, 09 May 2019 08:26:40 -0700