[teacher Zhao Qiang] use Load statement to Load data in Hive

Keywords: Big Data hive

1, Syntax description of load statement in Hive

Hive Load statement will not do any conversion work when loading data, but simply copy / move the data file to the address corresponding to hive table. The syntax format is as follows:

LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename \
[PARTITION (partcol1=val1, partcol2=val2 ...)]

LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename \
[PARTITION (partcol1=val1, partcol2=val2 ...)] \
[INPUTFORMAT 'inputformat' SERDE 'serde'] (3.0 or later)

Several notes:

  • If LOCAL is included in the command, the file path can be relative or absolute when loading data from the LOCAL file system. In this case, first copy the file locally to the corresponding location of hdfs, and then move it to the hive table. At this time, the original data file exists in the path under the hive table.
  • If the LOCAL keyword is not included, move the HDFS file to the destination table.
  • filepath can be a relative path or an absolute path. It can be a file or a folder directory. If it is a directory, all files under the folder will be loaded at this time.
  • If there is overwirte in the command, it means that the target table will be emptied before loading data. Otherwise, it is the way of appending.
  • If the table is a PARTITION table, the PARTITION clause must be specified. Otherwise, the following error will be reported:
    FAILED:SemanticException [Error 10062]: Need to specify partition columns because the destination table is partitioned

2, Sample load statement

  • Create the following table structure (employee table)
create table emp
(empno int,
ename string,
job string,
mgr int,
hiredate string,
sal int,
comm int,
deptno int)
row format delimited fields terminated by ',';
  • The test data are as follows:( emp.csv )
7369,SMITH,CLERK,7902,1980/12/17,800,0,20
7499,ALLEN,SALESMAN,7698,1981/2/20,1600,300,30
7521,WARD,SALESMAN,7698,1981/2/22,1250,500,30
7566,JONES,MANAGER,7839,1981/4/2,2975,0,20
7654,MARTIN,SALESMAN,7698,1981/9/28,1250,1400,30
7698,BLAKE,MANAGER,7839,1981/5/1,2850,0,30
7782,CLARK,MANAGER,7839,1981/6/9,2450,0,10
7788,SCOTT,ANALYST,7566,1987/4/19,3000,0,20
7839,KING,PRESIDENT,-1,1981/11/17,5000,0,10
7844,TURNER,SALESMAN,7698,1981/9/8,1500,0,30
7876,ADAMS,CLERK,7788,1987/5/23,1100,0,20
7900,JAMES,CLERK,7698,1981/12/3,950,0,30
7902,FORD,ANALYST,7566,1981/12/3,3000,0,20
7934,MILLER,CLERK,7782,1982/1/23,1300,0,10
  • Load HDFS data to Hive's table
load data inpath '/scott/emp.csv' into table emp;
  • Load local data to Hive's table
load data local inpath '/root/temp/emp.csv' into table emp;

Of course, we can also use insert statements to load data. For example, we create the following partition table:

create table emp_part_1
(empno int,
ename string,
job string,
mgr int,
hiredate string,
sal int,
comm int)
partitioned by (deptno int)
row format delimited fields terminated by ',';

Use the insert statement to insert the data into the corresponding partition.

Insert employee data for department 10
insert into table emp_part_1 partition(deptno=10) 
select empno,ename,job,mgr,hiredate,sal,comm from emp where deptno=10;

//Insert employee data for department 20
insert into table emp_part_1 partition(deptno=20) 
select empno,ename,job,mgr,hiredate,sal,comm from emp where deptno=20;

//Insert employee data for department 30
insert into table emp_part_1 partition(deptno=30) 
select empno,ename,job,mgr,hiredate,sal,comm from emp where deptno=30;

Here we use a sub query to find out the employee data of the corresponding department, and then use the insert statement to insert it into the corresponding partition.

Posted by whizzykid on Thu, 21 May 2020 21:08:45 -0700