Experiment 6 is familiar with Hive's basic operations

Keywords: Big Data hive

1. Experimental Purpose

(1) Understand Hive's role as a data warehouse in the Hadoop architecture.
(2) Skilled in using commonly used HiveQL.

2. Experimental Platform

Operating system: Ubuntu 18.04 (or Ubuntu 16.04);
Hadoop version: 3.1.3;
Hive version: 3.1.2;
JDK version: 1.8.

3. Data Sets

Dead work:

Provided by Hive Programming Guide (O'Reilly series, People's Posts and Telecommunications Press), download address:
https://raw.githubusercontent.com/oreillymedia/programming_hive/master/prog-hive-1st-ed-data.zip
Alternate download address:
https://www.cocobolo.top/FileServer/prog-hive-1st-ed-data.zip
Slow download refers to the resources I uploaded: Forest Rain Hive Dataset Download

After decompression, two files, stocks.csv and dividends.csv, are available for this experiment.

Enter your Downloads folder, right-click to unzip the data package you just downloaded, enter the prog-hive-1st-ed-data folder, and right-click to open the terminal:

cd ~/Downloads/prog-hive-1st-ed-data
sudo cp ./data/stocks/stocks.csv /usr/local/hive
sudo cp ./data/dividends/dividends.csv /usr/local/hive

Enter the Hadoop directory and start Hadoop:

cd /usr/local/hadoop
sbin/start-dfs.sh

Start MySQL:

service mysql start

Switch to the Hive directory and start MySQL and Hive:

cd /usr/local/hive
bin/hive

IV. EXPERIMENTAL STEPS

(1) Create an internal table, stocks, with a field delimiter of English commas, with the following table structure:

stocks table structure:

col_name	data_type
exchange	string
symbol	string
ymd	string
price_open	float
price_high	float
price_low	float
price_close	float
volume	int
price_adj_close	float

Code:

create table if not exists stocks
(
`exchange` string,
`symbol` string,
`ymd` string,
`price_open` float,
`price_high` float,
`price_low` float,
`price_close` float,
`volume` int,
`price_adj_close` float
)
row format delimited fields terminated by ',';

View Table:

hive> describe stocks;
OK
exchange            	string              	                    
symbol              	string              	                    
ymd                 	string              	                    
price_open          	float               	                    
price_high          	float               	                    
price_low           	float               	                    
price_close         	float               	                    
volume              	int                 	                    
price_adj_close     	float               	                    
Time taken: 0.062 seconds, Fetched: 9 row(s)
hive>

(2) Create an external partition table dividends (partition fields exchange and symbol) with field delimiters in English commas, and the table structure is as follows:

dividends table structure

col_name	data_type
ymd	string
dividend	float
exchange	string
symbol	string

Code:

create external table if not exists dividends
(
`ymd` string,
`dividend` float
)
partitioned by(`exchange` string ,`symbol` string)
row format delimited fields terminated by ',';

View Table:

hive> describe dividends;
OK
ymd                 	string              	                    
dividend            	float               	                    
exchange            	string              	                    
symbol              	string              	                    
	 	 
# Partition Information	 	 
# col_name            	data_type           	comment             
exchange            	string              	                    
symbol              	string              	                    
Time taken: 0.106 seconds, Fetched: 9 row(s)
hive>

(3) Import data from the stocks.csv file into the stocks table:

Code:

load data local inpath '/usr/local/hive/stocks.csv' overwrite into table stocks;

(4) Create an unpartitioned external table dividends_unpartitioned and import data from dividends.csv with the following table structure:

dividends_unpartitioned table structure

col_name	data_type
ymd	string
dividend	float
exchange	string
symbol	string

Code:

create external table if not exists dividends_unpartitioned
(
`exchange` string ,
`symbol` string,
`ymd` string,
`dividend` float
)
row format delimited fields terminated by ',';

Import data:

load data local inpath '/usr/local/hive/dividends.csv' overwrite into table dividends_unpartitioned;

(5) By dividends_unpartitioned query statement that uses the Hive auto-partitioning feature to insert corresponding data into each partition of the partition table dividends.

Code:

set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions.pernode=1000;
insert overwrite table dividends partition(`exchange`,`symbol`) select `ymd`,`dividend`,`exchange`,`symbol` from dividends_unpartitioned;

(6) Query the closing price (price_close) of IBM (symbol = IBM) for all dividend payment days (corresponding records in the dividends table) since 2000.