Welcome to the blog home page: Wechat search: Import_ Bigdata, hard core original author in big data field_ import_bigdata_ CSDN blog
Welcome to like, collect, leave messages, and exchange messages!
This article was originally written by [Wang Zhiwu] and started on CSDN blog!
This article is the first CSDN forum. It is strictly prohibited to reprint without the permission of the official and myself!
This article is right [hard big data learning route] learning guide for experts from zero to big data (fully upgraded version) The Hive section of.
3 user defined functions
1) Hive comes with some functions, such as max/min, but the number is limited. You can customize the UDF
Convenient expansion.
2) When the built-in functions provided by Hive cannot meet your business processing needs, you can consider using user customization
Function (UDF: user defined function).
3) There are three types of user-defined functions:
(1)UDF(User-Defined-Function)
One in and one out
(2)UDAF(User-Defined Aggregation Function)
Aggregate function, one more in and one out
Similar to: count/max/min
(3)UDTF(User-Defined Table-Generating Functions)
One in and many out
Such as lateral view expand ()
4) Official document address
https://cwiki.apache.org/confluence/display/Hive/HivePlugins
5) Programming steps:
(1) Inherit classes provided by Hive
org.apache.hadoop.hive.ql.udf.generic.GenericUDF
org.apache.hadoop.hive.ql.udf.generic.GenericUDTF;
(2) Implement abstract methods in classes
(3) Create a function in hive's command line window
Add jar
add jar linux_jar_path
Create function
create [temporary] function [dbname.]function_name AS class_name;
(4) Delete the function in hive's command line window
drop [temporary] function [if exists] [dbname.]function_name;
4 custom UDF functions
0) requirements:
Customize a UDF implementation to calculate the length of a given string, for example:
hive(default)> select my_len("abcd"); 4
1) Create a Maven project Hive
2) Import dependency
<dependencies> <dependency> <groupId>org.apache.hive</groupId> <artifactId>hive-exec</artifactId> <version>3.1.2</version> </dependency> </dependencies>
3) Create a class
package com.atguigu.hive; import org.apache.hadoop.hive.ql.exec.UDFArgumentException; import org.apache.hadoop.hive.ql.exec.UDFArgumentLengthException; import org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException; import org.apache.hadoop.hive.ql.metadata.HiveException; import org.apache.hadoop.hive.ql.udf.generic.GenericUDF; import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectIn spectorFactory; /** * To customize the UDF function, you need to inherit the GenericUDF class * Requirement: calculates the length of the specified string */ public class MyStringLength extends GenericUDF { /** * * @param arguments Discriminator object for input parameter type * @return Discriminator object of return value type * @throws UDFArgumentException */ @Override public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumentException { // Judge the number of input parameters if(arguments.length !=1){ throw new UDFArgumentLengthException("Input Args Length Error!!!"); } // Judge the type of input parameter if(!arguments[0].getCategory().equals(ObjectInspector.Category.PRIMITIVE) ){ throw new UDFArgumentTypeException(0,"Input Args Type Error!!!"); } //The return value of the function itself is int. you need to return a discriminator object of type int return PrimitiveObjectInspectorFactory.javaIntObjectInspector; } /** * Logical processing of functions * @param arguments Input parameters * @return Return value * @throws HiveException */ @Override public Object evaluate(DeferredObject[] arguments) throws HiveException { if(arguments[0].get() == null){ return 0; } return arguments[0].get().toString().length(); } @Override public String getDisplayString(String[] children) { return ""; } }
4) Print a jar package and upload it to the server / opt/module/data/myudf.jar
5) Add the jar package to hive's classpath
hive (default)> add jar /opt/module/data/myudf.jar;
6) Create a temporary function associated with the developed java class
hive (default)> create temporary function my_len as "com.atguigu.hive.
MyStringLength";
7) You can use custom functions in hql
hive (default)> select ename,my_len(ename) ename_len from emp;
5 custom UDTF functions
0) demand
Customize a UDTF implementation to cut a string with any delimiter into independent words, for example:
hive(default)> select myudtf("hello,world,hadoop,hive", ","); hello world hadoop hive
1) Code implementation
package com.atguigu.udtf; import org.apache.hadoop.hive.ql.exec.UDFArgumentException; import org.apache.hadoop.hive.ql.metadata.HiveException; import org.apache.hadoop.hive.ql.udf.generic.GenericUDTF; import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorFactory; import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector; import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectIn spectorFactory; import java.util.ArrayList; import java.util.List; public class MyUDTF extends GenericUDTF { private ArrayList<String> outList = new ArrayList<>(); @Override public StructObjectInspector initialize(StructObjectInspector argOIs) throws UDFArgumentException { //1. Define the column name and type of output data List<String> fieldNames = new ArrayList<>(); List<ObjectInspector> fieldOIs = new ArrayList<>(); //2. Add the column name and type of output data fieldNames.add("lineToWord"); fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector); return ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames, fieldOIs); } @Override public void process(Object[] args) throws HiveException { //1. Obtain original data String arg = args[0].toString(); //2. Get the second parameter passed in by data. Here is the separator String splitKey = args[1].toString(); //3. Segment the original data according to the incoming separator String[] fields = arg.split(splitKey); //4. Traverse the results after segmentation and write for (String field : fields) { //If the collection is reusable, empty the collection first outList.clear(); //Add each word to the collection outList.add(field); //Write out the contents of the collection forward(outList); } } @Override public void close() throws HiveException { } }
2) Print a jar package and upload it to the server / opt/module/hive/data/myudtf.jar
3) Add the jar package to hive's classpath
hive (default)> add jar /opt/module/hive/data/myudtf.jar;
4) Create a temporary function associated with the developed java class
hive (default)> create temporary function myudtf as
"com.atguigu.hive.MyUDTF";
5) Use custom functions
hive (default)> select myudtf("hello,world,hadoop,hive",",");