Sqoop of big data
1, Introduction to Sqoop
Sqoop is an open source tool, which is mainly used to transfer data between Hadoop(Hive) and traditional databases (mysql, postgresql...). It can import data from a relational database (such as mysql, Oracle, Postgres, etc.) into Hadoop's HDFS or HDFS data into a relational database.
The Sqoop project started in 2009. It first existed as a third-party module of Hadoop. Later, in order to enable users to deploy quickly and developers to develop iteratively, Sqoop became an Apache independently
project
The latest version of Sqoop2 is 1997. Note that 2 is incompatible with 1, and the features are incomplete, and it is not intended for production deployment.
2, Sqoop principle
Translate the import or export command into mapreduce program.
In the translated mapreduce, input format and output format are mainly customized.
3, Sqoop installation
See the following link for installation steps
https://www.cnblogs.com/botaoli/p/15516924.html
4, Simple use case of Sqoop
See the following link for simple use
https://www.cnblogs.com/botaoli/p/12753346.html
four point one Import data
In Sqoop, the concept of "import" refers to the transfer of data from non big data clusters (RDBMS) to big data clusters (HDFS, HIVE, HBASE). It is called import, that is, the import keyword is used.
4.1.1 RDBMS to HDFS
1) Make sure the Mysql service is turned on normally
2) Create a new table in Mysql and insert some data
$ mysql -uroot -p000000
mysql> create database company;
mysql> create table company.staff(id int(4) primary key not null auto_increment, name varchar(255), sex varchar(255));
mysql> insert into company.staff(name, sex) values('Thomas', 'Male');
mysql> insert into company.staff(name, sex) values('Catalina', 'FeMale');
3) Import data
$ bin/sqoop import \
--connect jdbc:mysql://hadoop1:3306/company \
--username root \
--password 000000 \
--table staff \
--target-dir /user/company \
--delete-target-dir \
--num-mappers 1 \
--fields-terminated-by "\t"
$ bin/sqoop import \
--connect jdbc:mysql://hadoop1:3306/company \
--username root \
--password 000000 \
--target-dir /user/company \
--delete-target-dir \
--num-mappers 1 \
--fields-terminated-by "\t" \
--query 'select name,sex from staff where id <=1 and $CONDITIONS;'
Tip: must contain '$conditions' in where claim.
If double quotation marks are used after query, a transition character must be added before $CONDITIONS to prevent the shell from recognizing it as its own variable.
(3) Import specified columns
$ bin/sqoop import \
--connect jdbc:mysql://hadoop1:3306/company \
--username root \
--password 000000 \
--target-dir /user/company \
--delete-target-dir \
--num-mappers 1 \
--fields-terminated-by "\t" \
--columns id,sex \
--table staff
$ bin/sqoop import \
--connect jdbc:mysql://hadoop1:3306/company \
--username root \
--password 000000 \
--target-dir /user/company \
--delete-target-dir \
--num-mappers 1 \
--fields-terminated-by "\t" \
--table staff \
--where "id=1"
4.1.2 RDBMS to Hive
$ bin/sqoop import \
--connect jdbc:mysql://hadoop1:3306/company \
--username root \
--password 000000 \
--table staff \
--num-mappers 1 \
--hive-import \
--fields-terminated-by "\t" \
--hive-overwrite \
--hive-table staff_hive
Tip: the process is divided into two steps. The first step is to import the data into HDFS, and the second step is to migrate the data imported into HDFS to
For Hive warehouse, the default temporary directory in the first step is / user/atguigu / table name
4.1.3 RDBMS to Hbase
$ bin/sqoop import \
--connect jdbc:mysql://hadoop1:3306/company \
--username root \
--password 000000 \
--table company \
--columns "id,name,sex" \
--column-family "info" \
--hbase-create-table \
--hbase-row-key "id" \
--hbase-table "hbase_company" \
--num-mappers 1 \
--split-by id
Note: sqoop1.4.6 only supports the function of automatically creating HBase tables in versions before HBase 1.0.1
Solution: manually create the HBase table
hbase> create 'hbase_company,'info'hbase> scan 'hbase_company,'info'
4.2 export data
In Sqoop, the concept of "export" refers to the transmission of data from big data clusters (HDFS, HIVE, HBASE) to non big data clusters (RDBMS), which is called export, that is, the export keyword is used.
4.2.1 HIVE/HDFS to RDBMS
$ bin/sqoop export \
--connect jdbc:mysql://hadoop102:3306/company \
--username root \
--password 000000 \
--table staff \
--num-mappers 1 \
--export-dir /user/hive/warehouse/staff_hive \
--input-fields-terminated-by "\t"
Tip: if the table does not exist in Mysql, it will not be created automatically
4.3 script packaging
Package the sqoop command with an opt format file and execute it
1) Create an. opt file
$ mkdir opt
$ touch opt/job_HDFS2RDBMS.opt
2) Writing sqoop scripts
$ vi opt/job_HDFS2RDBMS.opt
export
--connect
jdbc:mysql://hadoop1:3306/company
--username
root
--password
000000
--table
staff
--num-mappers
1
--export-dir
/user/hive/warehouse/staff_hive
--input-fields-terminated-by
"\t"
3) Execute the script
$ bin/sqoop --options-file opt/job_HDFS2RDBMS.opt