data base
Create a database
hive> CREATE DATABASE financials; hive> CREATE DATABASE IF NOT EXISTS financials;
Display existing databases
hive> SHOW DATABASES; default financials hive> CREATE DATABASE human_resources; hive> SHOW DATABASES; default financials human_resources
Conditional Query Database
hive> SHOW DATABASES LIKE 'h.*'; human_resources hive> ...
Create a database that specifies the location of the file to be stored
hive> CREATE DATABASE financials > LOCATION '/my/preferred/directory';
Add annotation information when creating database
hive> CREATE DATABASE financials > COMMENT 'Holds all financial tables'; hive> DESCRIBE DATABASE financials; financials Holds all financial tables hdfs://master-server/user/hive/warehouse/financials.db
Using databases
hive> USE financials;
Delete the database
hive> DROP DATABASE IF EXISTS financials;
When a database has tables, delete the tables before deleting the database
When there are tables in the database, it can not be deleted directly. It will be prompted that the database is not empty and there are tables. At this time, CASCADE keyword can be used.
hive> DROP DATABASE IF EXISTS financials CASCADE;
Data sheet
Create table
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name [(col_name data_type [COMMENT col_comment], ...)] [COMMENT table_comment] [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)] [CLUSTERED BY (col_name, col_name, ...) [SORTED BY(col_name[ASC|DESC], ...)] INTO num_buckets BUCKETS] [SKEWED BY (col_name, col_name, ...) ON ([(col_value, col_value, ...), ...|col_value, col_value, ...]) [STORED AS DIRECTORIES] [ [ROW FORMAT row_format] [STORED AS file_format] | STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] ] [LOCATION hdfs_path] [TBLPROPERTIES (property_name=property_value, ...)] [AS select_statement] //Example: CREATE TABLE IF NOT EXISTS mydb.employees ( name STRING COMMENT 'Employee name', salary FLOAT COMMENT 'Employee salary', subordinates ARRAY<STRING> COMMENT 'Names of subordinates', deductions MAP<STRING, FLOAT> COMMENT 'Keys are deductions names, values are percentages address STRUCT<street:STRING, city:STRING, state:STRING, zip:INT> COMMENT 'Home address') COMMENT 'Description of the table' TBLPROPERTIES ('creator'='me', 'created_at'='2012-01-02 10:00:00', ...) LOCATION '/user/hive/warehouse/mydb.db/employees';
CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name LIKE existing_table_or_view_name [LOCATION hdfs_path]; //Example: CREATE TABLE IF NOT EXISTS mydb.employees2 LIKE mydb.employees;
Display tables in a database
hive> USE mydb; hive> SHOW TABLES; employees table1 table2
Displays the name of the specified filter condition table
hive> USE mydb; hive> SHOW TABLES 'empl.*'; employees
Display table extension information
hive> DESCRIBE EXTENDED mydb.employees; name string Employee name salary float Employee salary subordinates array<string> Names of subordinates deductions map<string,float> Keys are deductions names, values are percentages address struct<street:string,city:string,state:string,zip:int> Home address Detailed Table Information Table(tableName:employees, dbName:mydb, owner:me, ... location:hdfs://master-server/user/hive/warehouse/mydb.db/employees, parameters:{creator=me, created_at='2012-01-02 10:00:00', last_modified_user=me, last_modified_time=1337544510, comment:Description of the table, ...}, ...)
Specifies the information to display a field
hive> DESCRIBE mydb.employees.salary; salary float Employee salary
External tables, delete tables without deleting data
CREATE EXTERNAL TABLE IF NOT EXISTS stocks ( exchange STRING, symbol STRING, ymd STRING, price_open FLOAT, price_high FLOAT, price_low FLOAT, price_close FLOAT, volume INT, price_adj_close FLOAT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION '/data/stocks';
Replication Table Structure Warehouse External Tables
CREATE EXTERNAL TABLE IF NOT EXISTS mydb.employees3 LIKE mydb.employees LOCATION '/path/to/data';
Partition table
CREATE TABLE employees ( name STRING, salary FLOAT, subordinates ARRAY<STRING>, deductions MAP<STRING, FLOAT>, address STRUCT<street:STRING, city:STRING, state:STRING, zip:INT> ) PARTITIONED BY (country STRING, state STRING);
HIVE can set partitioned tables to a "strict" mode, forbidding queries on partitioned tables without a WHERE clause
hive> set hive.mapred.mode=strict; hive> SELECT e.name, e.salary FROM employees e LIMIT 100; FAILED: Error in semantic analysis: No partition predicate found for Alias "e" Table "employees" hive> set hive.mapred.mode=nonstrict; hive> SELECT e.name, e.salary FROM employees e LIMIT 100;
View existing partitions
hive> SHOW PARTITIONS employees; ... Country=CA/state=AB country=CA/state=BC ... country=US/state=AL country=US/state=AK
View partition detail partition key
hive> SHOW PARTITIONS employees PARTITION(country='US'); country=US/state=AL country=US/state=AK ... hive> SHOW PARTITIONS employees PARTITION(country='US', state='AK'); country=US/state=AK
Display partition keys through DESC
hive> DESCRIBE EXTENDED employees; name string, salary float, ... address struct<...>, country string, state string Detailed Table Information... partitionKeys:[FieldSchema(name:country, type:string, comment:null), FieldSchema(name:state, type:string, comment:null)], ...
View table building statements
hive> show create table score; OK CREATE TABLE `score`( `num` string, `name` string, `score` float) PARTITIONED BY ( `year` string, `month` string, `day` string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 'hdfs://hdpcls1/user/hive/warehouse/score' TBLPROPERTIES ( 'transient_lastDdlTime'='1489117674') Time taken: 0.35 seconds, Fetched: 19 row(s)
Read data from tables into tables
INSERT OVERWRITE TABLE employees PARTITION (country = 'US', state = 'OR') SELECT * FROM staged_employees se WHERE se.cnty = 'US' AND se.st = 'OR';
Read partition tables from files
From local LOAD DATA LOCAL INPATH '${env:HOME}/california-employees' INTO TABLE employees PARTITION (country = 'US', state = 'CA'); //From hdfs LOAD DATA INPATH '/california-employees' INTO TABLE employees PARTITION (country = 'US', state = 'CA');
Adding specified partitions to external tables
ALTER TABLE log_messages ADD PARTITION(year = 2012, month = 1, day = 2) LOCATION 'hdfs://master_server/data/log_messages/2012/01/02';
Derived data
INSERT OVERWRITE LOCAL DIRECTORY '/tmp/ca_employees' SELECT name, salary, address FROM employees WHERE se.state = 'CA';
Bucket table
CREATE TABLE par_table(viewTime INT, userid BIGINT, page_url STRING, referrer_url STRING, ip STRING COMMENT 'IP Address of the User') COMMENT 'This is the page view table' PARTITIONED BY(date STRING, pos STRING) CLUSTERED BY(userid) SORTED BY(viewTime) INTO 32 BUCKETS ROW FORMAT DELIMITED '\t' FIELDS TERMINATED BY '\n' STORED AS SEQUENCEFILE;
Delete table
DROP TABLE IF EXISTS employees;
Modify table structure
ALTER TABLE modifies table metadata only. The data for the table is
untouched. It's up to you to ensure that any modifications are consistent
with the actual data.
Modify table name
ALTER TABLE log_messages RENAME TO logmsgs;
Add, modify, delete table partitions
ALTER TABLE log_messages ADD IF NOT EXISTS PARTITION (year = 2011, month = 1, day = 1) LOCATION '/logs/2011/01/01' PARTITION (year = 2011, month = 1, day = 2) LOCATION '/logs/2011/01/02' PARTITION (year = 2011, month = 1, day = 3) LOCATION '/logs/2011/01/03'
ALTER TABLE log_messages PARTITION(year = 2011, month = 12, day = 2) SET LOCATION 's3n://ourbucket/logs/2011/01/02';
ALTER TABLE log_messages DROP IF EXISTS PARTITION(year = 2011, month = 12, day = 2);
Modify columns
ALTER TABLE log_messages CHANGE COLUMN hms hours_minutes_seconds INT COMMENT 'The hours, minutes, and seconds part of the timestamp' AFTER severity;
This command allows you to change column names, data types, annotations, column locations, or any combination of them.
Add column
ALTER TABLE log_messages ADD COLUMNS ( app_name STRING COMMENT 'Application name', session_id LONG COMMENT 'The current session id');
The field position is behind all columns (before the partition column)
Delete the replacement column
ALTER TABLE log_messages REPLACE COLUMNS ( hours_mins_secs INT COMMENT 'hour, minute, seconds from timestamp', severity STRING COMMENT 'The message severity' message STRING COMMENT 'The rest of the message');