Sqoop of big data

Sqoop of big data

1, Introduction to Sqoop

Sqoop is an open source tool, which is mainly used to transfer data between Hadoop(Hive) and traditional databases (mysql, postgresql...). It can import data from a relational database (such as mysql, Oracle, Postgres, etc.) into Hadoop's HDFS or HDFS data into a relational database.
The Sqoop project started in 2009. It first existed as a third-party module of Hadoop. Later, in order to enable users to deploy quickly and developers to develop iteratively, Sqoop became an Apache independently
project
The latest version of Sqoop2 is 1997. Note that 2 is incompatible with 1, and the features are incomplete, and it is not intended for production deployment.

 

2, Sqoop principle

Translate the import or export command into mapreduce program.
In the translated mapreduce, input format and output format are mainly customized.

 

3, Sqoop installation

See the following link for installation steps

  https://www.cnblogs.com/botaoli/p/15516924.html

4, Simple use case of Sqoop

See the following link for simple use

  https://www.cnblogs.com/botaoli/p/12753346.html

four point one   Import data

In Sqoop, the concept of "import" refers to the transfer of data from non big data clusters (RDBMS) to big data clusters (HDFS, HIVE, HBASE). It is called import, that is, the import keyword is used.

4.1.1 RDBMS to HDFS

1) Make sure the Mysql service is turned on normally
2) Create a new table in Mysql and insert some data

$ mysql -uroot -p000000
mysql> create database company;
mysql> create table company.staff(id int(4) primary key not null auto_increment, name varchar(255), sex varchar(255));
mysql> insert into company.staff(name, sex) values('Thomas', 'Male');
mysql> insert into company.staff(name, sex) values('Catalina', 'FeMale');

3) Import data

(1) Import all
$ bin/sqoop import \
--connect jdbc:mysql://hadoop1:3306/company \
--username root \
--password 000000 \
--table staff \
--target-dir /user/company \
--delete-target-dir \
--num-mappers 1 \
--fields-terminated-by "\t"

 

(2) Query import
$ bin/sqoop import \
--connect jdbc:mysql://hadoop1:3306/company \
--username root \
--password 000000 \
--target-dir /user/company \
--delete-target-dir \
--num-mappers 1 \
--fields-terminated-by "\t" \
--query 'select name,sex from staff where id <=1 and $CONDITIONS;'

Tip: must contain '$conditions' in where claim.
If double quotation marks are used after query, a transition character must be added before $CONDITIONS to prevent the shell from recognizing it as its own variable.

 

(3) Import specified columns

$ bin/sqoop import \
--connect jdbc:mysql://hadoop1:3306/company \
--username root \
--password 000000 \
--target-dir /user/company \
--delete-target-dir \
--num-mappers 1 \
--fields-terminated-by "\t" \
--columns id,sex \
--table staff
Tip: if multiple columns are involved in columns, use commas to separate them. Do not add spaces during separation
 
(4) Filter query import data using the sqoop keyword
$ bin/sqoop import \
--connect jdbc:mysql://hadoop1:3306/company \
--username root \
--password 000000 \
--target-dir /user/company \
--delete-target-dir \
--num-mappers 1 \
--fields-terminated-by "\t" \
--table staff \
--where "id=1"

4.1.2 RDBMS to Hive

$ bin/sqoop import \
--connect jdbc:mysql://hadoop1:3306/company \
--username root \
--password 000000 \
--table staff \
--num-mappers 1 \
--hive-import \
--fields-terminated-by "\t" \
--hive-overwrite \
--hive-table staff_hive

Tip: the process is divided into two steps. The first step is to import the data into HDFS, and the second step is to migrate the data imported into HDFS to
For Hive warehouse, the default temporary directory in the first step is / user/atguigu / table name

 

  4.1.3 RDBMS to Hbase

 

$ bin/sqoop import \
--connect jdbc:mysql://hadoop1:3306/company \
--username root \
--password 000000 \
--table company \
--columns "id,name,sex" \
--column-family "info" \
--hbase-create-table \
--hbase-row-key "id" \
--hbase-table "hbase_company" \
--num-mappers 1 \
--split-by id

  

Note: sqoop1.4.6 only supports the function of automatically creating HBase tables in versions before HBase 1.0.1
Solution: manually create the HBase table

 

hbase> create 'hbase_company,'info'
hbase> scan 'hbase_company,'info'

 

  4.2 export data

In Sqoop, the concept of "export" refers to the transmission of data from big data clusters (HDFS, HIVE, HBASE) to non big data clusters (RDBMS), which is called export, that is, the export keyword is used.

  4.2.1 HIVE/HDFS to RDBMS

 

$ bin/sqoop export \
--connect jdbc:mysql://hadoop102:3306/company \
--username root \
--password 000000 \
--table staff \
--num-mappers 1 \
--export-dir /user/hive/warehouse/staff_hive \
--input-fields-terminated-by "\t"

Tip: if the table does not exist in Mysql, it will not be created automatically

4.3 script packaging

Package the sqoop command with an opt format file and execute it

1) Create an. opt file

$ mkdir opt
$ touch opt/job_HDFS2RDBMS.opt

2) Writing sqoop scripts

$ vi opt/job_HDFS2RDBMS.opt
export
--connect
jdbc:mysql://hadoop1:3306/company
--username
root
--password
000000
--table
staff
--num-mappers
1
--export-dir
/user/hive/warehouse/staff_hive
--input-fields-terminated-by
"\t"

 

3) Execute the script

$ bin/sqoop --options-file opt/job_HDFS2RDBMS.opt

 

 

Posted by Valdhor on Sat, 06 Nov 2021 01:58:58 -0700