[hard Hive] Hive Foundation (19): Hive function user defined function / user defined UDF function / user defined UDTF function

Keywords: Big Data hive

Welcome to the blog home page: Wechat search: Import_ Bigdata, hard core original author in big data field_ import_bigdata_ CSDN blog
Welcome to like, collect, leave messages, and exchange messages!
This article was originally written by [Wang Zhiwu] and started on CSDN blog!
This article is the first CSDN forum. It is strictly prohibited to reprint without the permission of the official and myself!

This article is right [hard big data learning route] learning guide for experts from zero to big data (fully upgraded version) The Hive section of.

3 user defined functions

1) Hive comes with some functions, such as max/min, but the number is limited. You can customize the UDF

Convenient expansion.

2) When the built-in functions provided by Hive cannot meet your business processing needs, you can consider using user customization

Function (UDF: user defined function).

3) There are three types of user-defined functions:

(1)UDF(User-Defined-Function)

One in and one out

(2)UDAF(User-Defined Aggregation Function)

Aggregate function, one more in and one out

Similar to: count/max/min

(3)UDTF(User-Defined Table-Generating Functions)

One in and many out

Such as lateral view expand ()

4) Official document address

https://cwiki.apache.org/confluence/display/Hive/HivePlugins

5) Programming steps:

(1) Inherit classes provided by Hive

org.apache.hadoop.hive.ql.udf.generic.GenericUDF

org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;

(2) Implement abstract methods in classes

(3) Create a function in hive's command line window

Add jar

add jar linux_jar_path

Create function

create [temporary] function [dbname.]function_name AS class_name;

(4) Delete the function in hive's command line window

drop [temporary] function [if exists] [dbname.]function_name;

4 custom UDF functions

0) requirements:

Customize a UDF implementation to calculate the length of a given string, for example:

hive(default)> select my_len("abcd"); 
4

1) Create a Maven project Hive

2) Import dependency

<dependencies>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-exec</artifactId>
<version>3.1.2</version>
</dependency>
</dependencies>

3) Create a class

package com.atguigu.hive;
import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
import org.apache.hadoop.hive.ql.exec.UDFArgumentLengthException;
import org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException;
import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectIn
spectorFactory;
/**
* To customize the UDF function, you need to inherit the GenericUDF class
* Requirement: calculates the length of the specified string
*/
public class MyStringLength extends GenericUDF {
 /**
 *
 * @param arguments Discriminator object for input parameter type
 * @return Discriminator object of return value type
 * @throws UDFArgumentException
 */
 @Override
 public ObjectInspector initialize(ObjectInspector[] arguments) throws 
UDFArgumentException {
 // Judge the number of input parameters
 if(arguments.length !=1){
 throw new UDFArgumentLengthException("Input Args Length 
Error!!!");
 }
 // Judge the type of input parameter
 
if(!arguments[0].getCategory().equals(ObjectInspector.Category.PRIMITIVE)
){
 throw new UDFArgumentTypeException(0,"Input Args Type 
Error!!!");
 }
 //The return value of the function itself is int. you need to return a discriminator object of type int
 return PrimitiveObjectInspectorFactory.javaIntObjectInspector;
 }
 /**
 * Logical processing of functions
 * @param arguments Input parameters
 * @return Return value
 * @throws HiveException
 */
 @Override
 public Object evaluate(DeferredObject[] arguments) throws 
HiveException {
 if(arguments[0].get() == null){
 return 0;
 }
 return arguments[0].get().toString().length();
 }
 @Override
 public String getDisplayString(String[] children) {
 return "";
 } }

4) Print a jar package and upload it to the server / opt/module/data/myudf.jar

5) Add the jar package to hive's classpath

hive (default)> add jar /opt/module/data/myudf.jar;

6) Create a temporary function associated with the developed java class

hive (default)> create temporary function my_len as "com.atguigu.hive.

MyStringLength";

7) You can use custom functions in hql

hive (default)> select ename,my_len(ename) ename_len from emp;

5 custom UDTF functions

0) demand

Customize a UDTF implementation to cut a string with any delimiter into independent words, for example:

hive(default)> select myudtf("hello,world,hadoop,hive", ",");
hello
world
hadoop
hive

1) Code implementation

package com.atguigu.udtf;
import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import 
org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory;
import 
org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectIn
spectorFactory;
import java.util.ArrayList;
import java.util.List;
public class MyUDTF extends GenericUDTF {
 private ArrayList<String> outList = new ArrayList<>();
 @Override
 public StructObjectInspector initialize(StructObjectInspector argOIs) 
throws UDFArgumentException {
 //1. Define the column name and type of output data
 List<String> fieldNames = new ArrayList<>();
 List<ObjectInspector> fieldOIs = new ArrayList<>();
 //2. Add the column name and type of output data
 fieldNames.add("lineToWord");
 
fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);
 return 
ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames, 
fieldOIs);
 }
 @Override
 public void process(Object[] args) throws HiveException {
 
 //1. Obtain original data
 String arg = args[0].toString();
 //2. Get the second parameter passed in by data. Here is the separator
 String splitKey = args[1].toString();
 //3. Segment the original data according to the incoming separator
 String[] fields = arg.split(splitKey);
 //4. Traverse the results after segmentation and write
 for (String field : fields) {
 //If the collection is reusable, empty the collection first
 outList.clear();
 //Add each word to the collection
 outList.add(field);
 //Write out the contents of the collection
 forward(outList);
 }
 }
 @Override
 public void close() throws HiveException {
 } 
}

2) Print a jar package and upload it to the server / opt/module/hive/data/myudtf.jar

3) Add the jar package to hive's classpath

hive (default)> add jar /opt/module/hive/data/myudtf.jar;

4) Create a temporary function associated with the developed java class

hive (default)> create temporary function myudtf as

"com.atguigu.hive.MyUDTF";

5) Use custom functions

hive (default)> select myudtf("hello,world,hadoop,hive",",");

 

Posted by Garcia on Fri, 10 Sep 2021 17:28:17 -0700