Recently, there is a need for data mining, which requires statistics of the number of things given in the vicinity of longitude and latitude n kilometers. When it comes to calculating the distance between two points of the earth, UDF should be written to calculate the distance between two points of the earth.
I. UDF Writing
According to latitude and longitude to calculate the distance between two points, there are many calculation methods on the Internet, try several, found this article. Blog The accuracy of the method is relatively small, and his analysis method is very detailed. Finally, this method is adopted.
import com.ai.hive.udf.topdomain.StringUtil; import org.apache.hadoop.hive.ql.exec.UDF; import org.apache.hadoop.io.Text; import org.apache.log4j.Logger; /** * Function: Calculate the distance between two points according to their longitude and latitude * create temporary function LocDistanceCalUDF as 'com.ai.hive.udf.util.LocDistanceCalUDF'; * @author olicity */ public class LocDistanceCalUDF extends UDF{ private static Logger log = Logger.getLogger(LocDistanceCalUDF.class); private Text nullText = new Text(""); /** *Calculating the Distance between Two Points of the Earth Based on Longitude and Latitude */ private static double distanceCal(double lng1, double lat1,double lng2,double lat2){ double dx = lng1 - lng2;// Longitude difference double dy= lat1 - lat2;// Latitudinal difference double b = (lat1 + lat2) / 2.0;// Average latitude double Lx = Math.toRadians(dx)*6367000.0*Math.cos(Math.toRadians(b));// East-West distance double Ly = 6367000.0*Math.toRadians(dy);// North South distance return Math.sqrt(Lx*Lx+Ly*Ly);// Calculating the Total Distance (m) by Using the Formula of Rectangular Diagonal Distance in Plane } /** *Rewriting evaluate method */ public Text evaluate(Text longitudeText1, Text latitudeText1,Text longitudeText2, Text latitudeText2){ try{ if(longitudeText1==null || latitudeText1==null || longitudeText2==null || latitudeText2==null){ return nullText; } if(StringUtil.isEmpty(longitudeText1.toString()) || StringUtil.isEmpty(latitudeText1.toString()) || StringUtil.isEmpty(longitudeText2.toString()) || StringUtil.isEmpty(latitudeText2.toString())){ return nullText; } double lng1 = Double.valueOf(longitudeText1.toString()); double lat1 = Double.valueOf(latitudeText1.toString()); double lng2 = Double.valueOf(longitudeText2.toString()); double lat2 = Double.valueOf(latitudeText2.toString()); double dis = distanceCal(lng1,lat1,lng2,lat2); return new Text(String.valueOf(dis)); }catch (Exception e){ return nullText; } } /** *Rewriting evaluate method */ public Text evaluate(Text locationA,Text locationB){ try{ if (locationA==null||locationB==null){ return nullText; } if(StringUtil.isEmpty(locationA.toString()) || StringUtil.isEmpty(locationB.toString())){ return nullText; } String locationA2String = locationA.toString(); String locationB2String = locationB.toString(); double lng1 = Double.valueOf(locationA2String.split(",")[0]); double lat1 = Double.valueOf(locationA2String.split(",")[1]); double lng2 = Double.valueOf(locationB2String.split(",")[0]); double lat2 = Double.valueOf(locationB2String.split(",")[1]); double dis = distanceCal(lng1,lat1,lng2,lat2); return new Text(String.valueOf(dis)); }catch(Exception e){ return nullText; } } }
UDF classes inherit the org.apache.hadoop.hive.ql.exec.UDF class, which implements evaluation. When we use a custom UDF in hive, hive calls the evaluate method in the class to implement specific functions.
II. UDF Import
1.jar package upload
Right-click the class name, Copy reference, and copy the full path to com.ai.hive.udf.util.LocDistance CalUDF. Type the finished classes into jar packages and upload them to the server. Paths such as: / user/olicity/hive/UDF
2. The jar package is introduced into the classpath variable
Enter hive, introduce jar packages, and execute commands
add jar /user/olicity/hive/UDF/udf.jar;
View the imported jar package
list jars;
3. Create functions
Create a temporary function called LocDistance CalUDF to associate the jar package
create temporary function LocDistanceCalUDF as 'com.ai.hive.udf.util.LocDistanceCalUDF';
View the created functions
show functions;
4. pay attention to
_The above method is limited to the current session, if a permanent function corresponding to the permanent path needs to be added,
create function locUDF.LocDistanceCalUDF as 'com.ai.hive.udf.util.LocDistanceCalUDF' using jar 'hdfs://hdfs path / udf.jar'; use LocDistanceCalUDF;
You need to put the jar package on hdfs, and then create the function correlation path.
You've also seen another way to configure hive.aux.jars.path in the hive-site.xml file
The configuration reference is as follows: <property> <name>hive.aux.jars.path</name> <value>file:///home/hdfs/fangjs/DefTextInputFormat.jar,file:///jarpath/test.jar</value> </property>
III. Use of UDF
_Preparations are in place, so we can start looking up the tables. emmmmm, just look up the two tables simply, the table structure and table data are not shown, the sample table is not built, the longitude and latitude table given is called table A, the table that needs to be queried is called table B, the temporary intermediate table is called table c, the field definition of longitude and latitude table is loc, even if the distance is 2 kilometers.
create table C as select B.* from B join A where (LocDistanceCalUDF(A.loc,B.loc)<=2000);
OK.
Four, summary
_More practice is needed on sql statements, especially multi-table joint checking. Hadoop has a long way to go.