Using sqoop, data is imported from mysql to hdfs, hbase, hive

Keywords: MySQL HBase JDBC hive

The first part is the reprinted article, which introduces the ways and means of transferring data from mysql to hdfs, hbase and hive. The following is an example of my own guiding mysql data to HDFS in practical projects for readers'reference.

1. Testing MySQL connections

bin/sqoop list-databases --connect jdbc:mysql://192.168.1.187:3306/trade_dev --username 'mysql' --password '111111'

2. Verify SQL statements

bin/sqoop eval --connect jdbc:mysql://192.168.1.187:3306/trade_dev --username 'mysql' --password '111111' --query "SELECT * FROM TB_REGION WHERE REGION_ID = '00A1719A489D4F49906A8CA9661CCBE8'"

3. import hdfs

3.1 import

bin/sqoop import --connect jdbc:mysql://192.168.1.187:3306/trade_dev --username 'mysql' --password '111111' --table TB_REGION --target-dir /sqoop/mysql/trade_dev/tb_region -m 5 --columns "code,name,category,farthercode,visible,regionlevel,region_id" --direct

3.2 validation

hdfs dfs -cat /sqoop/mysql/trade_dev_tb_region/*01

4. Import hbase

4.1 New hbase table

hbase shell
create 'mysql_trade_dev', 'region'

4.2 Import mysql data into hbase

bin/sqoop import --connect jdbc:mysql://192.168.1.187:3306/trade_dev --username 'mysql' --password '111111' --table TB_REGION --hbase-table mysql_trade_dev --hbase-row-key REGION_ID --column-family region

4.3 validation

scan 'mysql_trade_dev'
count 'mysql_trade_dev'

5. import hive

bin/sqoop import --connect jdbc:mysql://192.168.1.187:3306/trade_dev --username 'mysql' --password '111111' --table TB_REGION --hive-import --create-hive-table --target-dir /user/hive/warehouse/tb_region --hive-table tb_region

6. incremental hive

6.1 Initial import of hdfs

bin/sqoop job import --connect jdbc:mysql://192.168.1.187:3306/trade_dev --username mysql --password 111111 --table TB_DICTIONARY -m 1 --target-dir /sqoop/mysql/trade_dev/tb_dic --incremental append --check-column DIC_ID

Return data:

16/09/07 10:27:06 INFO tool.ImportTool: --incremental append
16/09/07 10:27:06 INFO tool.ImportTool: --check-column DIC_ID
16/09/07 10:27:06 INFO tool.ImportTool: --last-value 287
16/09/07 10:27:06 INFO tool.ImportTool: (Consider saving this with 'sqoop job --create')

6.2 Creating hive External Tables

CREATE EXTERNAL TABLE tb_dic (DIC_ID int, DOMAIN_ID STRING, DIC_TYPE_ID int, DESCRIPTION STRING, CODE int, NAME STRING, MNEMONIC STRING, ATTRIBUTE STRING, MARK_FOR_DEFAULT int, MARK_FOR_DELETE int, OPT_COUNTER int, CREATE_DATE STRING, CREATE_BY STRING, LAST_MODIFIED_DATE STRING, LAST_MODIFIED_BY STRING, ATTRIBUTE1 int, ATTRIBUTE2 int, ATTRIBUTE3 STRING, ATTRIBUTE4 STRING, ATTRIBUTE5 STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE location '/sqoop/mysql/trade_dev/tb_dic';

mysql table building statement
DROP TABLE IF EXISTS `TB_DICTIONARY`;
CREATE TABLE `TB_DICTIONARY` (
 `DIC_ID` int(11) NOT NULL AUTO_INCREMENT COMMENT 'Dictionaries ID',
 `DOMAIN_ID` varchar(45) NOT NULL DEFAULT 'domain1' COMMENT 'Service Domain Differentiation ID',
 `DIC_TYPE_ID` int(11) NOT NULL COMMENT 'Dictionary type ID-foreign key-TB_DICTIONARY_TYPE',
 `DESCRIPTION` varchar(1024) NOT NULL COMMENT 'Transliteration Code Interpretation',
 `CODE` tinyint(2) NOT NULL COMMENT 'Escape code',
 `NAME` varchar(45) NOT NULL COMMENT 'The corresponding meaning of escape codes',
 `MNEMONIC` varchar(45) DEFAULT NULL COMMENT 'Mnemonic code',
 `ATTRIBUTE` varchar(45) DEFAULT NULL COMMENT 'Current dictionary attributes: e.g. dimension types of measurement units',
 `MARK_FOR_DEFAULT` tinyint(2) NOT NULL DEFAULT '0' COMMENT 'Default flag (1 is default, 0 is non-default)',
 `MARK_FOR_DELETE` tinyint(2) NOT NULL DEFAULT '1' COMMENT 'Is it valid, 1: valid; 0: invalid?',
 `OPT_COUNTER` int(5) DEFAULT NULL COMMENT 'Version management flags',
 `CREATE_DATE` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'Date of creation',
 `CREATE_BY` varchar(45) NOT NULL DEFAULT 'admin' COMMENT 'Founder ID',
 `LAST_MODIFIED_DATE` datetime DEFAULT NULL COMMENT 'modification date',
 `LAST_MODIFIED_BY` varchar(45) DEFAULT NULL COMMENT 'Modifier ID',
 `ATTRIBUTE1` int(11) DEFAULT NULL,
 `ATTRIBUTE2` int(11) DEFAULT NULL,
 `ATTRIBUTE3` varchar(45) DEFAULT NULL,
 `ATTRIBUTE4` varchar(45) DEFAULT NULL,
 `ATTRIBUTE5` date DEFAULT NULL,
 PRIMARY KEY (`DIC_ID`)
) ENGINE=InnoDB AUTO_INCREMENT=290 DEFAULT CHARSET=utf8 COMMENT='Dictionary table-Si Yu Li';

SET FOREIGN_KEY_CHECKS = 1;

6.3 Update Incremental Data

bin/sqoop job --create incjob -- import --connect jdbc:mysql://192.168.1.187:3306/trade_dev --username mysql --password 111111 --table TB_DICTIONARY -m 1 --target-dir /sqoop/mysql/trade_dev/tb_dic --incremental append --check-column DIC_ID --last-value 287
bin/sqoop job --exec incjob

6.4 validation

select count(*) from tb_dic;

Return data:
For the first time

Time taken: 0.068 seconds, Fetched: 489 row(s)

The second time

Time taken: 0.068 seconds, Fetched: 490 row(s)

7. Integral library import test

7.1 New hbase table

hbase shell
create 'new_table','data'

7.2 Import mysql data into hbase

bin/sqoop import-all-tables --connect jdbc:mysql://192.168.1.187:3306/new_schema --username mysql --password 111111 --hbase-create-table --hbase-table new_table --column-family data --hbase-bulkload

Be careful

The whole library import requires that every table has a primary key, otherwise it will report errors.

16/09/08 15:03:50 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-xdata/compile/070fa1eda1e77fc70eaa0c532cfa94b8/nopk.jar
16/09/08 15:03:50 ERROR tool.ImportAllTablesTool: Error during import: No primary key could be found for table nopk. Please specify one with --split-by or perform a sequential import with '-m 1'.

Or add -- hbase-row-key id to each table with the same field (such as id) that can be row key

7.3 validation

scan 'new_table'
count 'new_table'

The above introduction is reproduced from the author: Shuita
Link: https://www.jianshu.com/p/5dcb893f2843
Source: Brief Book
Copyright belongs to the author. For commercial reprints, please contact the author for authorization. For non-commercial reprints, please indicate the source.

Project practice

Next, I migrate the contents of mysql database to hdfs in the actual project. Private information has been changed or commented out. Readers can fill in according to their actual situation.

One.mysql table

CREATE TABLE `auth_service_log` (
`id `bigint (20) NOT NULL AUTO_INCREMENT COMMENT'ID self-increasing',
`userid` int(11) unsigned NOT NULL COMMENT'user ID',
`appid` varchar(40) NOT NULL COMMENT 'appid',
`codedesc` varchar(50) NOT NULL COMMENT'agreed error code',
`Type `int (4) NOT NULL COMMENT'1: Identity authentication, 2: Academic certification',
`Code `int (4) NOT NULL COMMENT'Authentication Result',
`insertTime `timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT'Creation Time',
`updateTime `timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT'Update Time',
`Is isActive `tinyint (1) NOT NULL DEFAULT'1'COMMENT' valid',
`idNum` varchar(40) DEFAULT NULL COMMENT'ID number',
`requestIp` bigint(15) DEFAULT NULL,
`responseIp` bigint(15) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `idx_inserttime` (`insertTime`),
KEY `idx_updatetime` (`updateTime`),
KEY `idx_userid` (`userid`),
KEY `idx_codedesc` (`codedesc`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Note: mysql table building, inserttime, updatetime, isActive are all default fields.

phoenix table and index

CREATE TABLE IF NOT EXISTS PERSON.AuthServiceLog (
 UserID INTEGER NOT NULL,
 AppId VARCHAR(40) NOT NULL,
 Type INTEGER NOT NULL,
 InsertTime DATE NOT NULL,
 Codedesc VARCHAR(100) NULL,
 Code INTEGER NULL,
 IdNum VARCHAR(40) NULL,
 RequestIp BIGINT NULL,
 ResponseIp BIGINT NULL,
 IsActive BOOLEAN NULL,
 UpdateTime DATE NULL,
 CONSTRAINT pk_AuthServiceLog PRIMARY KEY (UserId, AppId, Type, InsertTime)
) default_column_family='uas', SALT_BUCKETS=32;
CREATE INDEX IF NOT EXISTS idx_AuthServiceLog_UpdateTime ON PERSON.AuthServiceLog (UpdateTime)
 INCLUDE (UserId, AppId,Type,InsertTime,Codedesc,Code,IdNum,RequestIp,ResponseIp,IsActive)
 SALT_BUCKETS=32;
CREATE INDEX IF NOT EXISTS idx_AuthServiceLog_codedesc ON PERSON.AuthServiceLog (codedesc)
 INCLUDE (UserId, AppId,Type,InsertTime,Code,IdNum,RequestIp,ResponseIp,IsActive,UpdateTime)
 SALT_BUCKETS=32;

3. Sqoop Operating Command

./sqoop import  --fields-terminated-by , --escaped-by \\ --enclosed-by '\"'   --connect 'jdbc:mysql://url:port/***' --username=*** --password=*** --query "SELECT UserId
,IF(AppId='',' ' , AppId) as APPID
,Type
,DATE_SUB(InsertTime,interval 8 hour ) InsertTime
,replace(replace(replace(replace(replace(replace(replace(IFNULL(codedesc,''),'\"',''),',',''),'''',''),'\r',''),'\n',''), '\\u00a0',''),'\u200B','') as codedesc
,code
,replace(replace(replace(replace(replace(replace(replace(IFNULL(IdNum,''),'\"',''),',',''),'''',''),'\r',''),'\n',''), '\\u00a0',''),'\u200B','') as IdNum
,IFNULL(RequestIp,'')
,IFNULL(ResponseIp ,'')
,IsActive
,DATE_SUB(updateTime,interval 8 hour ) updateTime
 FROM auth_service_log where  \$CONDITIONS "  --target-dir /AuthServiceLog -m 1

In this way, the data can be imported from mysql into the corresponding hdfs.

In addition: from SQL server to data to hdfs, the previous operations are basically the same, and the final sqoop operation command style is as follows:

 ./sqoop import  --fields-terminated-by , --escaped-by \\ --enclosed-by '\"'   --connect 'jdbc:sqlserver://url:port;database=***' --username=*** --password=*** --query "SELECT UserId
,Event_type
,DATEADD(HOUR,-8,CreationDate) CreationDate
,Action
,replace(replace(replace(replace(replace(replace(replace(replace(isnull(Description,''),'\"',''),',',''),'\',''),'''',''),'\r',''),'\n',''), '\\u00a0',''),'\u200B','') as Description
,IP
,IPLong
FROM Person_Activity where  \$CONDITIONS "  --target-dir /Person_Activity2 -m 1

    

Posted by varun_146100 on Mon, 07 Jan 2019 12:06:09 -0800