Reproduced from: https://www.cnblogs.com/MOBIN/p/5704001.html
1. Create HBase tables from Hive
Create a Live table pointing to HBase using the HQL statement
CREATE TABLE hbase_table_1(key int, value string) //Table name hbase_table_1 in Hive STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' //Designated Storage Processor WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:val") //Declare column family, column name TBLPROPERTIES ("hbase.table.name" = "xyz", "hbase.mapred.output.outputtable" = "xyz"); //hbase.table.name declares the HBase table name, which defaults to the same table name as Hive for optional attributes. //Hbase. mapred. output. output table specifies the table to be written when inserting data, and if you need to insert data into the table later, you need to specify the value.
The properties of the newly created Base table can be viewed through the HBase shell
$ hbase shell HBase Shell; enter 'help<RETURN>' for list of supported commands. Version: 0.20.3, r902334, Mon Jan 25 13:13:08 PST 2010 hbase(main):001:0> list xyz 1 row(s) in 0.0530 seconds hbase(main):002:0> describe "xyz" DESCRIPTION ENABLED {NAME => 'xyz', FAMILIES => [{NAME => 'cf1', COMPRESSION => 'NONE', VE true RSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]} 1 row(s) in 0.0220 seconds hbase(main):003:0> scan "xyz" ROW COLUMN+CELL 0 row(s) in 0.0060 seconds
Insert data into HBase tables using HQL
INSERT OVERWRITE TABLE hbase_table_1 SELECT * FROM pokes WHERE foo=98;
View the inserted data on the HBase side
hbase(main):009:0> scan "xyz" ROW COLUMN+CELL 98 column=cf1:val, timestamp=1267737987733, value=val_98 1 row(s) in 0.0110 seconds
2. Mapping HBase from Hive
Create a Hive table that points to an existing HBase table
CREATE EXTERNAL TABLE hbase_table_2(key int, value string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = "cf1:val") TBLPROPERTIES("hbase.table.name" = "some_existing_table", "hbase.mapred.output.outputtable" = "some_existing_table");
The Hive table has an external table, so deleting the table does not delete the data in the HBase table
Be careful:
- If no key is specified when creating a table or mapping table, the first column defaults to row keys.
- There is no concept of timestamp in the Hive table corresponding to HBase. The default return is the latest version of the value.
- Since there is no data type information in HBase, it is converted to String type when storing data.
3. Mapping of Multi-column and Multi-column Families
The following table: value1 and value2 are from the b c column corresponding to column family a, and value3 is from the column corresponding to column family d
CREATE TABLE hbase_table_1(key int, value1 string, value2 int, value3 int) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( "hbase.columns.mapping" = ":key,a:b,a:c,d:e" ); INSERT OVERWRITE TABLE hbase_table_1 SELECT foo, bar, foo+1, foo+2 FROM pokes WHERE foo=98 OR foo=100;
4. Mapping rules of Hive Map type in HBase
The following table: Map the HBase table through Hive's Map data type so that each row can have different column combinations, column names correspond to key s in maps, and column values correspond to values in maps
CREATE TABLE hbase_table_1(value map<string,int>, row_key int) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( "hbase.columns.mapping" = "cf:,:key" ); INSERT OVERWRITE TABLE hbase_table_1 SELECT map(bar, foo), foo FROM pokes WHERE foo=98 OR foo=100;
cf is a column family whose column name corresponds to bar in map and column value corresponds to foo in map.
Viewing data under HBase
hbase(main):012:0> scan "hbase_table_1" ROW COLUMN+CELL 100 column=cf:val_100, timestamp=1267739509194, value=100 98 column=cf:val_98, timestamp=1267739509194, value=98 2 row(s) in 0.0080 seconds
View data under Hive
hive> select * from hbase_table_1; Total MapReduce jobs = 1 Launching Job 1 out of 1 ... OK {"val_100":100} 100 {"val_98":98} 98 Time taken: 3.808 seconds
Note: Because the key in the map is used as the column name of HBase, the key type in the map must be String type.
The following mapping statements all report errors
a.
CREATE TABLE hbase_table_1(key int, value map<int,int>) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( "hbase.columns.mapping" = ":key,cf:" );
Reason: The key in the map must be String
b.
CREATE TABLE hbase_table_1(key int, value string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( "hbase.columns.mapping" = ":key,cf:" );
Cause: When the column family in hbase.columns.mapping is empty (like cf:), the corresponding data type in Hive is map, and the corresponding String in this statement causes an error.
5.Hive also supports simple composite line keys
Here's how to create a Live table pointing to HBase with two fields for row keys and ~separating between fields
CREATE EXTERNAL TABLE delimited_example(key struct<f1:string, f2:string>, value string) ROW FORMAT DELIMITED COLLECTION ITEMS TERMINATED BY '~' STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( 'hbase.columns.mapping'=':key,f:c1');
6. Attention should be paid to integrating HBase tables with Hive
Pre-partitioning the HBase table, increasing the parallelism of its MapReduce job and designing rowkey reasonably to distribute it as far as possible on the pre-partitioned Region by setting hbase. client. scanner. caching a reasonable scan cache
Reference material: