Hive custom UDF function

Keywords: hive Hadoop Apache Session

In hive, sometimes you need to customize some functions according to business requirements. Here are the steps to customize functions
1. Create a new maven project and introduce dependencies in the project's pom file

        <dependency>
            <groupId>org.apache.hive</groupId>
            <artifactId>hive-exec</artifactId>
            <version>3.1.2</version>
        </dependency>

2. Create a new class, inherit UDF, and rewrite the evaluation () method. The following is a method of adding field prefix. The specific implementation refers to the following code

import org.apache.hadoop.hive.ql.exec.Description;
import org.apache.hadoop.hive.ql.exec.UDF;

import java.util.Random;

@Description(
        name = "min",
        value = "_FUNC_(expr) - add a number and '_' before the expr"
)
public class AddPrefixUDF extends UDF {

    public String evaluate(String input){
        Random random = new Random();
        int num = random.nextInt(10);
        return num + "_" + input;
    }

    public static void main(String[] args) {
        AddPrefixUDF addPrefixUDF = new AddPrefixUDF();
        String result = addPrefixUDF.evaluate("test");
        System.out.println(result);
    }
}

3. Packing through maven, uploading files to a path in linux.
4. In the hit command, add jar files and create functions

hive (ruozedata_ba)> add jar /home/hadoop/lib/hadoop-project-1.0.jar;
Added [/home/hadoop/lib/hadoop-project-1.0.jar] to class path
Added resources: [/home/hadoop/lib/hadoop-project-1.0.jar]
hive (ruozedata_ba)> create TEMPORARY  function add_prefix as 'com.wxx.bigdata.hive.udf.AddPrefixUDF';
OK
Time taken: 0.101 seconds
hive (ruozedata_ba)> show functions;
OK
tab_name
!
!=
%
&
*
+
-
/
<
<=
<=>
<>
=
==
>
>=
^
abs
acos
add_months
add_prefix
...
hive (ruozedata_ba)> select add_prefix(platform) from platform_stat;
OK
_c0
4_Android
6_MAC os
1_WIN
4_iOS
8_windows mobile
6_windows phone
Time taken: 0.332 seconds, Fetched: 6 row(s)

5. The above is to add a temporary function, the current hive session takes effect, replaced by a session show functions; then the two temporary functions can not be found.
6. Add a permanent function.
6.1 Upload jar files to HDFS.

[hadoop@hadoop000 lib]$ hdfs dfs -mkdir /lib
[hadoop@hadoop000 lib]$ hdfs dfs -put /home/hadoop/lib/hadoop-project-1.0.jar /lib
[hadoop@hadoop000 lib]$ hdfs dfs -ls /lib
Found 1 items
-rw-r--r--   1 hadoop supergroup      50187 2019-09-25 17:37 /lib/hadoop-project-1.0.jar
[hadoop@hadoop000 lib]$ 

6.2 Create permanent functions in hive

CREATE FUNCTION add_prefix_new AS "com.wxx.bigdata.hive.udf.AddPrefixUDF"
USING JAR "hdfs://hadoop000:8020/lib/hadoop-project-1.0.jar";

CREATE FUNCTION remove_prefix_new AS "com.wxx.bigdata.hive.udf.RemovePrefixUDF"
USING JAR "hdfs://hadoop000:8020/lib/hadoop-project-1.0.jar";

6.3 Through hive configuration file hive-site.xml, find the database of hive configuration. After login, check the created permanent function.

6.4 After the test function was created successfully, the newly opened session will take effect as well.

hive (ruozedata_ba)> select add_prefix_new(platform) from platform_stat;
OK
_c0
6_Android
7_MAC os
1_WIN
9_iOS
6_windows mobile
3_windows phone
Time taken: 0.658 seconds, Fetched: 6 row(s)
hive (ruozedata_ba)> 



 

Posted by ded on Mon, 30 Sep 2019 03:59:50 -0700