Common sql statements in hive

Keywords: hive Database Apache Hadoop

data base

Create a database

hive> CREATE DATABASE financials;


hive> CREATE DATABASE IF NOT EXISTS financials;

Display existing databases

hive> SHOW DATABASES;
default
financials
hive> CREATE DATABASE human_resources;
hive> SHOW DATABASES;
default
financials
human_resources

Conditional Query Database

hive> SHOW DATABASES LIKE 'h.*';
human_resources
hive> ...

Create a database that specifies the location of the file to be stored

hive> CREATE DATABASE financials
    > LOCATION '/my/preferred/directory';

Add annotation information when creating database

hive> CREATE DATABASE financials
    > COMMENT 'Holds all financial tables';
hive> DESCRIBE DATABASE financials;
financials   Holds all financial tables
  hdfs://master-server/user/hive/warehouse/financials.db

Using databases

hive> USE financials;

Delete the database

hive> DROP DATABASE IF EXISTS financials;

When a database has tables, delete the tables before deleting the database

When there are tables in the database, it can not be deleted directly. It will be prompted that the database is not empty and there are tables. At this time, CASCADE keyword can be used.

hive> DROP DATABASE IF EXISTS financials CASCADE;

Data sheet

Create table

CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
[(col_name data_type [COMMENT col_comment], ...)]
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
[CLUSTERED BY (col_name, col_name, ...) [SORTED BY(col_name[ASC|DESC], ...)] INTO num_buckets BUCKETS]
[SKEWED BY (col_name, col_name, ...) ON ([(col_value, col_value, ...), ...|col_value, col_value, ...]) [STORED AS DIRECTORIES]
  [ [ROW FORMAT row_format] [STORED AS file_format]
   | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)]  ]
  [LOCATION hdfs_path]
  [TBLPROPERTIES (property_name=property_value, ...)] 
  [AS select_statement] 


//Example:
CREATE TABLE IF NOT EXISTS mydb.employees (
  name         STRING COMMENT 'Employee name',
  salary       FLOAT  COMMENT 'Employee salary',
  subordinates ARRAY<STRING> COMMENT 'Names of subordinates',
  deductions   MAP<STRING, FLOAT>
               COMMENT 'Keys are deductions names, values are percentages
  address      STRUCT<street:STRING, city:STRING, state:STRING, zip:INT>
               COMMENT 'Home address')
COMMENT 'Description of the table'
TBLPROPERTIES ('creator'='me', 'created_at'='2012-01-02 10:00:00', ...)
LOCATION '/user/hive/warehouse/mydb.db/employees';
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
  LIKE existing_table_or_view_name
  [LOCATION hdfs_path];


//Example:
CREATE TABLE IF NOT EXISTS mydb.employees2
LIKE mydb.employees;

Display tables in a database

hive> USE mydb;
hive> SHOW TABLES;
employees
table1
table2

Displays the name of the specified filter condition table

hive> USE mydb;
hive> SHOW TABLES 'empl.*';
employees

Display table extension information

hive> DESCRIBE EXTENDED mydb.employees;
name    string  Employee name
salary  float   Employee salary
subordinates    array<string>   Names of subordinates
deductions      map<string,float> Keys are deductions names, values are percentages
address struct<street:string,city:string,state:string,zip:int>  Home address
Detailed Table Information      Table(tableName:employees, dbName:mydb, owner:me,
...
location:hdfs://master-server/user/hive/warehouse/mydb.db/employees,
parameters:{creator=me, created_at='2012-01-02 10:00:00',
            last_modified_user=me, last_modified_time=1337544510,
            comment:Description of the table, ...}, ...)

Specifies the information to display a field

hive> DESCRIBE mydb.employees.salary;
salary  float   Employee salary

External tables, delete tables without deleting data

CREATE EXTERNAL TABLE IF NOT EXISTS stocks (
  exchange        STRING,
  symbol          STRING,
  ymd             STRING,
  price_open      FLOAT,
  price_high      FLOAT,
  price_low       FLOAT,
  price_close     FLOAT,
  volume          INT,
  price_adj_close FLOAT)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION '/data/stocks';

Replication Table Structure Warehouse External Tables

CREATE EXTERNAL TABLE IF NOT EXISTS mydb.employees3
LIKE mydb.employees
LOCATION '/path/to/data';

Partition table

CREATE TABLE employees (
  name         STRING,
  salary       FLOAT,
  subordinates ARRAY<STRING>,
  deductions   MAP<STRING, FLOAT>,
  address      STRUCT<street:STRING, city:STRING, state:STRING, zip:INT>
)
PARTITIONED BY (country STRING, state STRING);

HIVE can set partitioned tables to a "strict" mode, forbidding queries on partitioned tables without a WHERE clause

hive> set hive.mapred.mode=strict;
hive> SELECT e.name, e.salary FROM employees e LIMIT 100;
FAILED: Error in semantic analysis: No partition predicate found for
 Alias "e" Table "employees"
hive> set hive.mapred.mode=nonstrict;
hive> SELECT e.name, e.salary FROM employees e LIMIT 100;

View existing partitions

hive> SHOW PARTITIONS employees;
...
Country=CA/state=AB
country=CA/state=BC
...
country=US/state=AL
country=US/state=AK

View partition detail partition key

hive> SHOW PARTITIONS employees PARTITION(country='US');
country=US/state=AL
country=US/state=AK
...
hive> SHOW PARTITIONS employees PARTITION(country='US', state='AK');
country=US/state=AK

Display partition keys through DESC

hive> DESCRIBE EXTENDED employees;
name         string,
salary       float,
...
address      struct<...>,
country      string,
state        string
Detailed Table Information...
partitionKeys:[FieldSchema(name:country, type:string, comment:null),
FieldSchema(name:state, type:string, comment:null)],
...

View table building statements

hive> show create table score;
OK
CREATE  TABLE `score`(
  `num` string,
  `name` string,
  `score` float)
PARTITIONED BY (
  `year` string,
  `month` string,
  `day` string)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '\t'
  LINES TERMINATED BY '\n'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://hdpcls1/user/hive/warehouse/score'
TBLPROPERTIES (
  'transient_lastDdlTime'='1489117674')
Time taken: 0.35 seconds, Fetched: 19 row(s)

Read data from tables into tables

INSERT OVERWRITE TABLE employees
PARTITION (country = 'US', state = 'OR')
SELECT * FROM staged_employees se
WHERE se.cnty = 'US' AND se.st = 'OR';

Read partition tables from files

From local
LOAD DATA LOCAL INPATH '${env:HOME}/california-employees'
INTO TABLE employees
PARTITION (country = 'US', state = 'CA');

//From hdfs
LOAD DATA  INPATH '/california-employees'
INTO TABLE employees
PARTITION (country = 'US', state = 'CA');

Adding specified partitions to external tables

ALTER TABLE log_messages ADD PARTITION(year = 2012, month = 1, day = 2)
LOCATION 'hdfs://master_server/data/log_messages/2012/01/02';

Derived data

INSERT OVERWRITE LOCAL DIRECTORY '/tmp/ca_employees'
SELECT name, salary, address
FROM employees
WHERE se.state = 'CA';

Bucket table

CREATE TABLE par_table(viewTime INT, userid BIGINT,
     page_url STRING, referrer_url STRING,
     ip STRING COMMENT 'IP Address of the User')
 COMMENT 'This is the page view table'
 PARTITIONED BY(date STRING, pos STRING)
 CLUSTERED BY(userid) SORTED BY(viewTime) INTO 32 BUCKETS
 ROW FORMAT DELIMITED '\t'
   FIELDS TERMINATED BY '\n'
STORED AS SEQUENCEFILE;

Delete table

DROP TABLE IF EXISTS employees;

Modify table structure

ALTER TABLE modifies  table metadata  only. The  data  for  the  table  is
untouched. It's up to you to ensure that any modifications are consistent
with the actual data.

Modify table name

ALTER TABLE log_messages RENAME TO logmsgs;

Add, modify, delete table partitions

ALTER TABLE log_messages ADD IF NOT EXISTS
PARTITION (year = 2011, month = 1, day = 1) LOCATION '/logs/2011/01/01'
PARTITION (year = 2011, month = 1, day = 2) LOCATION '/logs/2011/01/02'
PARTITION (year = 2011, month = 1, day = 3) LOCATION '/logs/2011/01/03'
ALTER TABLE log_messages PARTITION(year = 2011, month = 12, day = 2)
SET LOCATION 's3n://ourbucket/logs/2011/01/02';
ALTER TABLE log_messages DROP IF EXISTS PARTITION(year = 2011, month = 12, day = 2);

Modify columns

ALTER TABLE log_messages
CHANGE COLUMN hms hours_minutes_seconds INT
COMMENT 'The hours, minutes, and seconds part of the timestamp'
AFTER severity;

This command allows you to change column names, data types, annotations, column locations, or any combination of them.

Add column

ALTER TABLE log_messages ADD COLUMNS (
 app_name   STRING COMMENT 'Application name',
 session_id LONG   COMMENT 'The current session id');

The field position is behind all columns (before the partition column)

Delete the replacement column

ALTER TABLE log_messages REPLACE COLUMNS (
 hours_mins_secs INT    COMMENT 'hour, minute, seconds from timestamp',
 severity        STRING COMMENT 'The message severity'
 message         STRING COMMENT 'The rest of the message');

Posted by johnsonzhang on Mon, 15 Apr 2019 14:42:32 -0700