Import data (cluster as object)
In Sqoop, the concept of "import" refers to the transfer of data from non big data cluster (RDBMS) to big data cluster (HDFS, HIVE, HBASE). It is called "import", that is, using the import keyword.
1 RDBMS to HDFS
1) make sure the Mysql service is turned on normally
2) create a new table in Mysql and insert some data
$ mysql -uroot -p000000 mysql> create database company; mysql> create table company.staff(id int(4) primary key not null auto_increment, name varchar(255), sex varchar(255)); mysql> insert into company.staff(name, sex) values('Thomas', 'Male'); mysql> insert into company.staff(name, sex) values('Catalina', 'FeMale');
3) import data
(1) import all
$ bin/sqoop import \ --connect jdbc:mysql://hadoop102:3306/company \ --username root \ --password 000000 \ --table staff \ --target-dir /user/company \ --delete-target-dir \ --num-mappers 1 \ --fields-terminated-by "\t"
(2) query import
$ bin/sqoop import \ --connect jdbc:mysql://hadoop102:3306/company \ --username root \ --password 000000 \ --target-dir /user/company \ --delete-target-dir \ --num-mappers 1 \ --fields-terminated-by "\t" \ --query 'select name,sex from staff where id <=1 and $CONDITIONS;'
Prompt: must contain '$CONDITIONS' in WHERE clause, which is used to pass parameters to ensure the consistency of the last written data.
If you use double quotation marks after query, you must add a transfer character before $CONDITIONS to prevent the shell from recognizing it as its own variable.
(3) import the specified column
$ bin/sqoop import \ --connect jdbc:mysql://hadoop102:3306/company \ --username root \ --password 000000 \ --target-dir /user/company \ --delete-target-dir \ --num-mappers 1 \ --fields-terminated-by "\t" \ --columns id,sex \ --table staff
Tip: if multiple columns are involved in columns, separate them with commas. Do not add spaces when separating them
(4) filter and query the imported data using the skip keyword
$ bin/sqoop import \ --connect jdbc:mysql://hadoop102:3306/company \ --username root \ --password 000000 \ --target-dir /user/company \ --delete-target-dir \ --num-mappers 1 \ --fields-terminated-by "\t" \ --table staff \ --where "id=1"
2 RDBMS to Hive
$ bin/sqoop import \ --connect jdbc:mysql://hadoop102:3306/company \ --username root \ --password 000000 \ --table staff \ --num-mappers 1 \ --hive-import \ --fields-terminated-by "\t" \ --hive-overwrite \ --hive-table staff_hive
Tip: this process is divided into two steps. The first step is to import data to HDFS, and the second step is to migrate the data imported to HDFS to Hive warehouse. The default temporary directory of the first step is / user / your user name / table name
3 RDBMS to Hbase
$ bin/sqoop import \ --connect jdbc:mysql://hadoop102:3306/company \ --username root \ --password 000000 \ --table company \ --columns "id,name,sex" \ --column-family "info" \ --hbase-create-table \ --hbase-row-key "id" \ --hbase-table "hbase_company" \ --num-mappers 1 \ --split-by id
Note: sqoop1.4.6 only supports the automatic creation of HBase tables in versions prior to HBase 1.0.1
Solution: create HBase table manually
hbase> create 'hbase_company,'info'
(5) in HBase, scan gets the following
hbase> scan 'hbase_company'