E-commerce data warehouse of data warehouse -- 2. Business data collection platform

Keywords: MySQL Big Data hive Data Warehouse sqoop

1, Introduction to e-commerce business

1.1 e-commerce business process

The business process of e-commerce can be illustrated by taking the browsing footprint of an ordinary user as an example. Users click the home page of e-commerce to start browsing. They may find their favorite products through classified query or full-text search. These products are undoubtedly stored in the background management system.

When users find the goods they like, they may want to buy. After adding the goods to the shopping cart, they find that they need to log in. After logging in, they settle the goods. At this time, the management of the shopping cart and the generation of commodity order information will have an impact on the business database and generate corresponding order data and payment data.

After the order is formally generated, the order will be tracked until all orders are completed.
The main business processes of e-commerce include the management of commodity details when users browse commodities at the front desk, the management of user's personal Center & payment service when users add commodities to the shopping cart for payment, and the management of order background service after users pay. These processes involve more than a dozen or even dozens of business data tables, or even more.

1.2 e-commerce knowledge (SKU, SPU)

SKU=StockKeepingUni(t basic unit of inventory): now it has been extended to the abbreviation of unified product number. Each product corresponds to a unique SKU number.
SPU(Standard Product Unit): it is the smallest unit of commodity information aggregation and a set of reusable and easy to retrieve standardized information collection.

For example, an iPhone x phone is a SPU. A silver, 128G memory iPhoneX that supports China Unicom Network is the SKU.

tips:
In short, SKU and SPU refer to a class of products; SPU refers to a class of goods with the same commodity name and model; SKU refers to a kind of commodity with the same commodity name, model and parameter attributes.
SPU represents a class of goods. The advantage is that you can share commodity pictures, posters, sales attributes, etc.

1.3 structure of e-commerce system table

The business data table structure relationship involved in the e-commerce warehouse system consists of 34 tables. Centered on the order table, user table, SKU commodity table, activity table and coupon table, it extends the coupon collection table, payment flow table, activity order table, order details table, order status table, commodity comment table, code dictionary table, chargeback table, SPU commodity table, etc, The user table provides the details of the user, the payment flow table provides the payment details of the order, the order details table provides the quantity of goods in the order, and the commodity table provides the details of goods to the order details table.

E-commerce business table

Background management system

1.3.1 activity_info

1.3.2 activity_rule

1.3.3 activity_sku

1.3.4 platform attribute table (base_attr_info)

1.3.5 platform attribute value table (base_attr_value)

1.3.6 base_category 1

1.3.7 base_category 2

1.3.8 base_category 3

1.3.9 dictionary table (base_dic)

1.3.10 base_province

1.3.11 base_region

1.3.12 base_trademark

1.3.13 cart_info

1.3.14 comment_info

1.3.15 coupon information form (coupon_info)

1.3.16 coupon_range

1.3.17 coupon collection form (coupon_use)

1.3.18 favorite_info

1.3.19 order_detail

1.3.20 order_detail_activity

1.3.21 order_detail_coupon

1.3.22 order_info

1.3.23 order_refund_info

1.3.24 order_status_log

1.3.25 payment_info

1.3.26 refund_payment

1.3.27 SKU platform attribute table (sku_attr_info)

1.3.28 SKU information table (sku_info)

1.3.29 SKU sales attribute table (sku_sale_attr_value)

1.3.30 SPU information table (spu_info)

1.3.31 SPU sales attribute table (spu_sale_attr)

1.3.32 SPU sales attribute value table (spu_sale_attr_value)

1.3.33 user_address table

1.3.34 user information table (user_info)

2, Business data acquisition module

2.1 MySQL installation

2.1.1 installing MySQL

  1. As shown in the figure, upload the installation package and JDBC driver to / opt/software/mysql:
  2. Uninstall the built-in MySQL LIBS
[xiaobai@hadoop102 mysql]$ rpm -qa | grep -i -E mysql\ | mariadb | xargs =-n1 sudo rpm -e --nodeps

tips: query the built-in rpm package - i ignore case - e regular expression query mysql/mariadb; xargs=-n1 means that the output of the previous command is used as the input of the next command sudo rpm -e --nodeps.

If you encounter some MySQL stamps that are difficult to delete, here = = >

  1. Install MySQL dependencies:
[xiaobai@hadoop102 mysql]$ sudo rpm -ivh 01_mysql-community-common-5.7.16-1.el7.x86_64.rpm 
[xiaobai@hadoop102 mysql]$ sudo rpm -ivh 02_mysql-community-libs-5.7.16-1.el7.x86_64.rpm 
[xiaobai@hadoop102 mysql]$ sudo rpm -ivh 03_mysql-community-libs-compat-5.7.16-1.el7.x86_64.rpm 
  1. To install MySQL client:
[xiaobai@hadoop102 mysql]$ sudo rpm -ivh 04_mysql-community-client-5.7.16-1.el7.x86_64.rpm 
  1. To install MySQL server:
[xiaobai@hadoop102 mysql]$ sudo rpm -ivh 05_mysql-community-server-5.7.16-1.el7.x86_64.rpm 

notes ⚠️:
If the following error is reported, it is caused by the installation of the old version of GPG keys by yum. After rpm version 4.1, the signature of the software package will be automatically checked when installing or upgrading the software package.

Solution: add -- force --nodeps after the install package command:

[xiaobai@hadoop102 mysql]$ sudo rpm -ivh 02_mysql-community-libs-5.7.16-1.el7.x86_64.rpm --force --nodeps
  1. Start MySQL:
[xiaobai@hadoop102 mysql]$ sudo systemctl start mysqld
  1. View MySQL password:
[xiaobai@hadoop102 mysql]$ sudo cat /var/log/mysqld.log | grep password

2.1.2 configuring MySQL

notes ⚠️:
As long as the configuration is root + password, you can log in to MySQL database on any host.

  1. Enter MySQL with the password just queried:

If you cannot access mysql using this method, you will be prompted with ERROR 1045 (28000): Access denied for user 'root' @ 'localhost. Please stamp here = = >

  1. Set complex password (MySQL password policy):
mysql> set password=password("H232%sd=55");
  1. Change MySQL password policy:
mysql> set global validate_password_length=4;
mysql> set global validate_password_policy=0;
  1. Set your own password:
mysql> set password=password("******");
  1. Enter MySQL Library:
mysql> use mysql
  1. Query user table:
mysql> select user, host from user;
+-----------+-----------+
| user      | host      |
+-----------+-----------+
| mysql.sys | localhost |
| root      | localhost |
+-----------+-----------+
2 rows in set (0.00 sec)
  1. Modify the user table and the Host table to%:
mysql> update user set host="%" where user="root";
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> select user,host from user;
+-----------+-----------+
| user      | host      |
+-----------+-----------+
| root      | %         |
| mysql.sys | localhost |
+-----------+-----------+
2 rows in set (0.00 sec)
  1. Refresh:
mysql> flush privileges;
  1. sign out:
mysql> quit

2.2 business data generation

2.2.1 connecting to MySQL

To connect to MySQL database through Navicat Premium, click connection test to test:

2.2.2 table creation statement

  1. Create database gmall through Navicat;

  2. As shown in the figure, set the database code: the default character set is set to utf8, and the default sorting rule is set to utf8_general_ci:

  3. Import database structure script (gmall.sql): open database – > Run sql file – > GMALL. sql – > Start:
    notes ⚠️: Code selection utf-8;

  4. As shown in the figure, right click "table" and "Refresh" to see the table in the database:

2.2.3 generate business data

  1. Create dB in / opt/module / directory of Hadoop 102_ Log folder:
[xiaobai@hadoop102 module]$ mkdir db_log/
  1. Upload the business data generator, configuration files gmall2020-mock-db-2021-01-22.jar and application.properties to / opt / module / DB of Hadoop 102_ Log path;

  2. Modify the relevant configuration of application.properties as required:

logging.level.root=info


spring.datasource.driver-class-name=com.mysql.jdbc.Driver
spring.datasource.url=jdbc:mysql://hadoop102:3306/gmall?characterEncoding=utf-8&useSSL=false&serverTimezone=GMT%2B8
spring.datasource.username=root
spring.datasource.password=Remember to change the password!

logging.pattern.console=%m%n


mybatis-plus.global-config.db-config.field-strategy=not_null


#Business date
mock.date=2020-06-14
#Reset note: the first execution must be set to 1, and subsequent reset is not required, so it does not need to be set to 1
mock.clear=1
#Whether to reset the user note: the first execution must be set to 1, and subsequent reset is not required, so it does not need to be set to 1
mock.clear.user=1

#Number of new users generated
mock.user.count=100
#Male ratio
mock.user.male-rate=20
#User data change probability
mock.user.update-rate:20

#Collection cancellation ratio
mock.favor.cancel-rate=10
#Collection quantity
mock.favor.count=100

#Probability of adding shopping cart per user
mock.cart.user-rate=50
#How many items can each user add to the shopping cart at most each time
mock.cart.max-sku-count=8 
#How many items can I buy at most
mock.cart.max-sku-num=3 

#Shopping cart source, user query, commodity promotion, intelligent recommendation, promotion activities
mock.cart.source-type-rate=60:20:10:10

#User order proportion
mock.order.user-rate=50
#Proportion of goods purchased by users from shopping
mock.order.sku-rate=50
#Whether to participate in the activity
mock.order.join-activity=1
#Whether to use shopping vouchers
mock.order.use-coupon=1
#Number of people receiving shopping vouchers
mock.coupon.user-count=100

#Payment proportion
mock.payment.rate=70
#Payment method: Alipay: WeChat: UnionPay
mock.payment.payment-type=30:60:10


#Evaluation proportion: Good: medium: Poor: automatic
mock.comment.appraise-rate=30:10:10:50

#Refund reason proportion: quality problem, the description of the goods is inconsistent with the actual description, the out of stock number is inappropriate, the shooting is wrong, and I don't want to buy other products
mock.refund.reason-rate=30:10:20:5:15:5:5

2.3 Sqoop installation

2.3.1 download and unzip

  1. Download address: http://mirrors.hust.edu.cn/apache/sqoop/1.4.6/
  2. Upload the installation package sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz to / opt/software/sqoop path of Hadoop 102:
[xiaobai@hadoop102 sqoop]$ ll
total 16476
-rw-r--r--. 1 root root 16870735 Oct  3 21:02 sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz
  1. Unzip the sqoop installation package to the specified directory / opt/module /:
[xiaobai@hadoop102 sqoop]$ tar -zxvf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz -C /opt/module/
  1. Put sqoop-1.4.6.bin in / opt/module / directory__ Hadoop-2.0.4-alpha is renamed sqoop:
[xiaobai@hadoop102 module]$ mv sqoop-1.4.6.bin__hadoop-2.0.4-alpha/ sqoop

2.3.2 modify configuration file

  1. Enter the / opt/module/sqoop/conf directory and rename the configuration file sqoop-env-template.sh to sqoop-env.sh:
[xiaobai@hadoop102 conf]$ mv sqoop-env-template.sh sqoop-env.sh
  1. Modify profile
[xiaobai@hadoop102 conf]$ vim sqoop-env.sh 

Add the following:

export HADOOP_COMMON_HOME=/opt/module/hadoop-3.2.2
export HADOOP_MAPRED_HOME=/opt/module/hadoop-3.2.2
export HIVE_HOME=/opt/module/hive
export ZOOKEEPER_HOME=/opt/module/zookeeper-3.5.7
export ZOOCFGDIR=/opt/module/zookeeper-3.5.7/conf

2.3.3 copy JDBC Driver

Copy the JDBC driver mysql-connector-java-5.1.27-bin.jar originally uploaded to / opt/software/mysql to / opt/module/sqoop/lib /

[xiaobai@hadoop102 conf]$ cp /opt/software/mysql/mysql-connector-java-5.1.27-bin.jar /opt/module/sqoop/lib/

2.3.4 verify Sqoop

You can use some command s to verify whether the sqoop configuration is correct, such as:

[xiaobai@hadoop102 sqoop]$ bin/sqoop help

As shown in the figure, some warning and help command outputs will appear:

2.3.5 test whether Sqoop can successfully connect to the database

Under the root directory of sqoop, use the following command to test whether sqoop can connect to the database correctly:

[xiaobai@hadoop102 sqoop]$ bin/sqoop list-databases --connect jdbc:mysql://hadoop102:3306/ --username root --password ******

2.3.6 basic use of sqoop

Add SKU to mysql_ Import the info table data into the path of HDSF:

[xiaobai@hadoop102 sqoop]$ bin/sqoop import  \
> --connect jdbc:mysql://hadoop102:3306/gmall \
> --username root \
> --password mysql123 \
> --table sku_info \
> --columns id,sku_name \
> --where 'id>=1 and id<=20' \
> --target-dir /sku_info \
> --delete-target-dir \
> --fields-terminated-by '\t' \
> --num-mappers 2 \
> --split-by id

notes ⚠️: Because the mapreduce of sqoop only has a map and no reduce, the number of final output files is the number of maps.

2.4 synchronization strategy

The types of data synchronization strategies include: full synchronization, incremental synchronization, new and change synchronization, and special cases;

Full scale: store complete data.
Incremental table: stores newly added data.
Add and change table: stores newly added data and changed data.
Special tables: only need to be stored once.

2.4.1 full synchronization strategy

Daily full volume: store a complete data every day as a partition.
Applicable scenario: it is applicable to scenarios where the amount of table data is small and there are both new data insertion and old data modification every day.
For example: code dictionary table, brand table, commodity three-level classification, commodity two-level classification, commodity one-level classification, preferential rule table, activity table, activity participating commodity table, additional purchase table, commodity collection table, preferential volume table, SKU commodity table and SPU commodity table.

2.4.2 incremental synchronization strategy

Daily increment: store an incremental data every day as a partition.
Applicable scenario: it is applicable to scenarios where there is a large amount of table data and only new data will be inserted every day.
For example: chargeback table, order status table, payment flow table, order details table, activity and order association table, and product comment table.

2.4.3 new and change strategies

Daily addition and change: storage creation time and operation time are today's data.
Applicable scenario: the table has a large amount of data, which will be added and changed.
For example: user table, order table, coupon collection table.

2.4.4 special strategies

Special strategy: for some special dimension tables, it is not necessary to follow the above synchronization strategy.

  1. Objective world dimension: the dimensions of the objective world that have not changed (such as gender, region, nationality, political composition, shoe size) can only have a fixed value.
  2. Date dimension: date dimension can import data of one year or several years at one time.

2.5 business data import into HDFS

2.5.1 analysis table synchronization strategy

In the production environment, individual small companies import all tables in full for simple processing.
Due to the large amount of data, medium and large companies still import data in strict accordance with the synchronization strategy.

Analyze table synchronization policy:

2.5.2 first day synchronization script of business data

  1. Create MySQL in the / home/xiaobai/bin directory_ to_ hdfs_ init.sh:
[xiaobai@hadoop102 bin]$ vim mysql_to_hdfs_init.sh

Add the following content to the script:

#! /bin/bash

APP=gmall
sqoop=/opt/module/sqoop/bin/sqoop

if [ -n "$2" ] ;then
   do_date=$2
else 
   echo "Please pass in the date parameter"
   exit
fi 

import_data(){
$sqoop import \
--connect jdbc:mysql://hadoop102:3306/$APP \
--username root \
--password ****** \
--target-dir /origin_data/$APP/db/$1/$do_date \
--delete-target-dir \
--query "$2 where \$CONDITIONS" \
--num-mappers 1 \
--fields-terminated-by '\t' \
--compress \
--compression-codec lzop \
--null-string '\\N' \
--null-non-string '\\N'

hadoop jar /opt/module/hadoop-3.2.2/share/hadoop/common/hadoop-lzo-0.4.20.jar com.hadoop.compression.lzo.DistributedLzoIndexer /origin_data/$APP/db/$1/$do_date
}

import_order_info(){
  import_data order_info "select
                            id, 
                            total_amount, 
                            order_status, 
                            user_id, 
                            payment_way,
                            delivery_address,
                            out_trade_no, 
                            create_time, 
                            operate_time,
                            expire_time,
                            tracking_no,
                            province_id,
                            activity_reduce_amount,
                            coupon_reduce_amount,                            
                            original_total_amount,
                            feight_fee,
                            feight_fee_reduce      
                        from order_info"
}

import_coupon_use(){
  import_data coupon_use "select
                          id,
                          coupon_id,
                          user_id,
                          order_id,
                          coupon_status,
                          get_time,
                          using_time,
                          used_time,
                          expire_time
                        from coupon_use"
}

import_order_status_log(){
  import_data order_status_log "select
                                  id,
                                  order_id,
                                  order_status,
                                  operate_time
                                from order_status_log"
}

import_user_info(){
  import_data "user_info" "select 
                            id,
                            login_name,
                            nick_name,
                            name,
                            phone_num,
                            email,
                            user_level, 
                            birthday,
                            gender,
  create_time,
                            operate_time
                          from user_info"
}

import_order_detail(){
  import_data order_detail "select 
                              id,
                              order_id, 
                              sku_id,
                              sku_name,
                              order_price,
                              sku_num, 
                              create_time,
                              source_type,
                              source_id,
                              split_total_amount,
                              split_activity_amount,
                              split_coupon_amount
                            from order_detail"
}

import_payment_info(){
  import_data "payment_info"  "select 
                                id,  
                                out_trade_no, 
                                order_id, 
                                user_id, 
                                payment_type, 
                                trade_no, 
                                total_amount,  
                                subject, 
                                payment_status,
                                create_time,
                                callback_time 
                              from payment_info"
}

import_comment_info(){
  import_data comment_info "select
                              id,
                              user_id,
                              sku_id,
                              spu_id,
                              order_id,
                              appraise,
                              create_time
                            from comment_info"
}

import_order_refund_info(){
  import_data order_refund_info "select
                                id,
                                user_id,
                                order_id,
                                sku_id,
                                refund_type,
                                refund_num, refund_amount,
                                refund_reason_type,
                                refund_status,
                                create_time
                              from order_refund_info"
}

import_sku_info(){
  import_data sku_info "select 
                          id,
                          spu_id,
                          price,
                          sku_name,
                          sku_desc,
                          weight,
                          tm_id,
                          category3_id,
                          is_sale,
                          create_time
                        from sku_info"
}

import_base_category1(){
  import_data "base_category1" "select 
                                  id,
                                  name 
                                from base_category1"
}

import_base_category2(){
  import_data "base_category2" "select
                                  id,
                                  name,
                                  category1_id 
                                from base_category2"
}

import_base_category3(){
  import_data "base_category3" "select
                                  id,
                                  name,
                                  category2_id
                                from base_category3"
}

import_base_province(){
  import_data base_province "select
                              id,
                              name,
                              region_id,
                              area_code,
                              iso_code,
                              iso_3166_2
                            from base_province"
}

import_base_region(){
  import_data base_region "select id,
                              region_name
                            from base_region"
}

import_base_trademark(){
  import_data base_trademark "select
                                id,
                                tm_name
                              from base_trademark"
}

import_spu_info(){
  import_data spu_info "select
                            id,
                            spu_name,
                            category3_id,
                            tm_id
                          from spu_info"
}

import_favor_info(){
  import_data favor_info "select
                          id,
                          user_id,
                          sku_id,
                          spu_id,
                          is_cancel,
                          create_time,
                          cancel_time
                        from favor_info"
}

import_cart_info(){
  import_data cart_info "select
                        id,
                        user_id,
                        sku_id,
                        cart_price,
                        sku_num,
                        sku_name,
                        create_time,
                        operate_time,
                        is_ordered,
                        order_time,
                        source_type,
                        source_id
                      from cart_info"
}

import_coupon_info(){
  import_data coupon_info "select
                          id,
                          coupon_name,
                          coupon_type,
                          condition_amount,
                          condition_num,
                          activity_id,
  benefit_amount,
                          benefit_discount,
                          create_time,
                          range_type,
                          limit_num,
                          taken_count,
                          start_time,
                          end_time,
                          operate_time,
                          expire_time
                        from coupon_info"
}

import_activity_info(){
  import_data activity_info "select
                              id,
                              activity_name,
                              activity_type,
                              start_time,
                              end_time,
                              create_time
                            from activity_info"
}

import_activity_rule(){
    import_data activity_rule "select
                                    id,
                                    activity_id,
                                    activity_type,
                                    condition_amount,
                                    condition_num,
                                    benefit_amount,
                                    benefit_discount,
                                    benefit_level
                                from activity_rule"
}

import_base_dic(){
    import_data base_dic "select
                            dic_code,
                            dic_name,
                            parent_code,
                            create_time,
                            operate_time
                          from base_dic"
}


import_order_detail_activity(){
    import_data order_detail_activity "select
                                                                id,
                                                                order_id,
                                                                order_detail_id,
                                                                activity_id,activity_rule_id,
                                                                sku_id,
                                                                create_time
                                                            from order_detail_activity"
}


import_order_detail_coupon(){
    import_data order_detail_coupon "select
                                                                id,
								                                                order_id,
                                                                order_detail_id,
                                                                coupon_id,
                                                                coupon_use_id,
                                                                sku_id,
                                                                create_time
                                                            from order_detail_coupon"
}


import_refund_payment(){
    import_data refund_payment "select
                                                        id,
                                                        out_trade_no,
                                                        order_id,
                                                        sku_id,
                                                        payment_type,
                                                        trade_no,
                                                        total_amount,
                                                        subject,
                                                        refund_status,
                                                        create_time,
                                                        callback_time
                                                    from refund_payment"                                                    

}

import_sku_attr_value(){
    import_data sku_attr_value "select
       id,
                                                    attr_id,
                                                    value_id,
                                                    sku_id,
                                                    attr_name,
                                                    value_name
                                                from sku_attr_value"
}


import_sku_sale_attr_value(){
    import_data sku_sale_attr_value "select
                                                            id,
                                                            sku_id,
                                                            spu_id,
                                                            sale_attr_value_id,
                                                            sale_attr_id,
                                                            sale_attr_name,
                                                            sale_attr_value_name
                                                        from sku_sale_attr_value"
}

case $1 in
  "order_info")
     import_order_info
;;
  "base_category1")
     import_base_category1
;;
  "base_category2")
     import_base_category2
;;
  "base_category3")
     import_base_category3
;;
  "order_detail")
     import_order_detail
;;
  "sku_info")
     import_sku_info
;;
  "user_info")
     import_user_info
;;
  "payment_info")
     import_payment_info
;;
  "base_province")
     import_base_province
;;
 "base_region")
     import_base_region
;;
  "base_trademark")
     import_base_trademark
;;
  "activity_info")
      import_activity_info
;;
  "cart_info")
      import_cart_info
;;
  "comment_info")
      import_comment_info
;;
  "coupon_info")
      import_coupon_info
;;
  "coupon_use")
      import_coupon_use
;;
  "favor_info")
      import_favor_info
;;
  "order_refund_info")
      import_order_refund_info
;;
  "order_status_log")
      import_order_status_log
;;
  "spu_info")
      import_spu_info
;;
  "activity_rule")
      import_activity_rule
;;
  "base_dic")
      import_base_dic
;;
  "order_detail_activity")
      import_order_detail_activity
;;
  "order_detail_coupon")
      import_order_detail_coupon
;;
  "refund_payment")
      import_refund_payment
;;
  "sku_attr_value")
      import_sku_attr_value
;;
  "sku_sale_attr_value")
      import_sku_sale_attr_value
;;
  "all")
   import_base_category1
   import_base_category2
   import_base_category3
import_order_info
   import_order_detail
   import_sku_info
   import_user_info
   import_payment_info
   import_base_region
   import_base_province
   import_base_trademark
   import_activity_info
   import_cart_info
   import_comment_info
   import_coupon_use
   import_coupon_info
   import_favor_info
   import_order_refund_info
   import_order_status_log
   import_spu_info
   import_activity_rule
   import_base_dic
   import_order_detail_activity
   import_order_detail_coupon
   import_refund_payment
   import_sku_attr_value
   import_sku_sale_attr_value
;;
esac
  1. Add execution permission to the script, see 2.5.3 ⬇️

2.5.3 daily synchronization script of business data

  1. Create MySQL in the / home/xiaobai/bin directory_ to_ hdfs.sh:
    Add the following to the script:
#! /bin/bash

APP=gmall
sqoop=/opt/module/sqoop/bin/sqoop

if [ -n "$2" ] ;then
    do_date=$2
else
    do_date=`date -d '-1 day' +%F`
fi
import_data(){
$sqoop import \
--connect jdbc:mysql://hadoop102:3306/$APP \
--username root \
--password ****** \
--target-dir /origin_data/$APP/db/$1/$do_date \
--delete-target-dir \
--query "$2 and  \$CONDITIONS" \
--num-mappers 1 \
--fields-terminated-by '\t' \
--compress \
--compression-codec lzop \
--null-string '\\N' \
--null-non-string '\\N'

hadoop jar /opt/module/hadoop-3.2.2/share/hadoop/common/hadoop-lzo-0.4.20.jar com.hadoop.compression.lzo.DistributedLzoIndexer /origin_data/$APP/db/$1/$do_date
}

import_order_info(){
  import_data order_info "select
                            id, 
                            total_amount, 
                            order_status, 
                            user_id, 
                            payment_way,
                            delivery_address,
                            out_trade_no, 
                            create_time, 
                            operate_time,
                            expire_time,
                            tracking_no,
                            province_id,
                            activity_reduce_amount,
                            coupon_reduce_amount,                            
                            original_total_amount,
                            feight_fee,
                            feight_fee_reduce      
                        from order_info
                        where (date_format(create_time,'%Y-%m-%d')='$do_date' 
                        or date_format(operate_time,'%Y-%m-%d')='$do_date')"
}

import_coupon_use(){
  import_data coupon_use "select
                          id,
                          coupon_id,
                          user_id,
                          order_id,
                          coupon_status,
                          get_time,
                          using_time,
                          used_time,
                          expire_time
                        from coupon_use
 where (date_format(get_time,'%Y-%m-%d')='$do_date'
                        or date_format(using_time,'%Y-%m-%d')='$do_date'
                        or date_format(used_time,'%Y-%m-%d')='$do_date'
                        or date_format(expire_time,'%Y-%m-%d')='$do_date')"
}

import_order_status_log(){
  import_data order_status_log "select
                                  id,
                                  order_id,
                                  order_status,
                                  operate_time
                                from order_status_log
                                where date_format(operate_time,'%Y-%m-%d')='$do_date'"
}

import_user_info(){
  import_data "user_info" "select 
                            id,
                            login_name,
                            nick_name,
                            name,
                            phone_num,
                            email,
                            user_level, 
                            birthday,
                            gender,
                            create_time,
                            operate_time
                          from user_info 
                          where (DATE_FORMAT(create_time,'%Y-%m-%d')='$do_date' 
                          or DATE_FORMAT(operate_time,'%Y-%m-%d')='$do_date')"
}

import_order_detail(){
  import_data order_detail "select 
                              id,
                              order_id, 
                              sku_id,
                              sku_name,
                              order_price,
                              sku_num, 
                              create_time,
                              source_type,
                              source_id,
                              split_total_amount,
                              split_activity_amount,
                              split_coupon_amount
                            from order_detail 
                            where DATE_FORMAT(create_time,'%Y-%m-%d')='$do_date'"}

import_payment_info(){
  import_data "payment_info"  "select 
                                id,  
                                out_trade_no, 
                                order_id, 
                                user_id, 
                                payment_type, 
                                trade_no, 
                                total_amount,  
                                subject, 
                                payment_status,
                                create_time,
                                callback_time 
                              from payment_info 
                              where (DATE_FORMAT(create_time,'%Y-%m-%d')='$do_date' 
                              or DATE_FORMAT(callback_time,'%Y-%m-%d')='$do_date')"
}

import_comment_info(){
  import_data comment_info "select
                              id,
                              user_id,
                              sku_id,
                              spu_id,
                              order_id,
                              appraise,
                              create_time
                            from comment_info
                            where date_format(create_time,'%Y-%m-%d')='$do_date'"
}

import_order_refund_info(){
  import_data order_refund_info "select
                                id,
                                user_id,
                                order_id,
                                sku_id,
                                refund_type,
                                refund_num,
                                refund_amount,
                                refund_reason_type,
                                refund_status,
                                create_time
                              from order_refund_info
                              where date_format(create_time,'%Y-%m-%d')='$do_date'"
}

import_sku_info(){
  import_data sku_info "select 
                          id,
                          spu_id,
                          price,
 sku_name,
                          sku_desc,
                          weight,
                          tm_id,
                          category3_id,
                          is_sale,
                          create_time
                        from sku_info where 1=1"
}

import_base_category1(){
  import_data "base_category1" "select 
                                  id,
                                  name 
                                from base_category1 where 1=1"
}

import_base_category2(){
  import_data "base_category2" "select
                                  id,
                                  name,
                                  category1_id 
                                from base_category2 where 1=1"
}

import_base_category3(){
  import_data "base_category3" "select
                                  id,
                                  name,
                                  category2_id
                                from base_category3 where 1=1"
}

import_base_province(){
  import_data base_province "select
                              id,
                              name,
                              region_id,
                              area_code,
                              iso_code,
                              iso_3166_2
                            from base_province
                            where 1=1"
}

import_base_region(){
  import_data base_region "select
                              id,
                              region_name
                            from base_region
                            where 1=1"
}

import_base_trademark(){
  import_data base_trademark "select
                                id,
                                tm_name
                              from base_trademark   where 1=1"
}

import_spu_info(){
  import_data spu_info "select
                            id,
                            spu_name,
                            category3_id,
                            tm_id
                          from spu_info
                          where 1=1"
}

import_favor_info(){
  import_data favor_info "select
                          id,
                          user_id,
                          sku_id,
                          spu_id,
                          is_cancel,
                          create_time,
                          cancel_time
                        from favor_info
                        where 1=1"
}

import_cart_info(){
  import_data cart_info "select
                        id,
                        user_id,
                        sku_id,
                        cart_price,
                        sku_num,
                        sku_name,
                        create_time,
                        operate_time,
                        is_ordered,
                        order_time,
                        source_type,
                        source_id
                      from cart_info
                      where 1=1"
}

import_coupon_info(){
  import_data coupon_info "select
                          id,
                          coupon_name,
                          coupon_type,
                          condition_amount,
                          condition_num,
                          activity_id,
                          benefit_amount,
                          benefit_discount,
                          create_time,
                          range_type,
                          limit_num,
                          taken_count,
    start_time,
                          end_time,
                          operate_time,
                          expire_time
                        from coupon_info
                        where 1=1"
}

import_activity_info(){
  import_data activity_info "select
                              id,
                              activity_name,
                              activity_type,
                              start_time,
                              end_time,
                              create_time
                            from activity_info
                            where 1=1"
}

import_activity_rule(){
    import_data activity_rule "select
                                    id,
                                    activity_id,
                                    activity_type,
                                    condition_amount,
                                    condition_num,
                                    benefit_amount,
                                    benefit_discount,
                                    benefit_level
                                from activity_rule
                                where 1=1"
}

import_base_dic(){
    import_data base_dic "select
                            dic_code,
                            dic_name,
                            parent_code,
                            create_time,
                            operate_time
                          from base_dic
                          where 1=1"
}


import_order_detail_activity(){
    import_data order_detail_activity "select
                                                                id,
                                                                order_id,
                                                                order_detail_id,
                                                                activity_id,
                                                                activity_rule_id,sku_id,
                                                                create_time
                                                            from order_detail_activity
                                                            where date_format(create_time,'%Y-%m-%d')='$do_date'"
}


import_order_detail_coupon(){
    import_data order_detail_coupon "select
                                                                id,
                                                                order_id,
                                                                order_detail_id,
                                                                coupon_id,
                                                                coupon_use_id,
                                                                sku_id,
                                                                create_time
                                                            from order_detail_coupon
                                                            where date_format(create_time,'%Y-%m-%d')='$do_date'"
}


import_refund_payment(){
    import_data refund_payment "select
                                                        id,
                                                        out_trade_no,
                                                        order_id,
                                                        sku_id,
                                                        payment_type,
                                                        trade_no,
                                                        total_amount,
                                                        subject,
                                                        refund_status,
                                                        create_time,
                                                        callback_time
                                                    from refund_payment
                                                    where (DATE_FORMAT(create_time,'%Y-%m-%d')='$do_date' 
                                                    or DATE_FORMAT(callback_time,'%Y-%m-%d')='$do_date')"                                                    

}

import_sku_attr_value(){
    import_data sku_attr_value "select
                                                    id,
                                                    attr_id,
                                                    value_id,
                                                    sku_id,
                                                    attr_name,
                                                    value_name
                                                from sku_attr_value
                                                where 1=1"
}


import_sku_sale_attr_value(){
    import_data sku_sale_attr_value "select
                                                            id,
                                                            sku_id,
                                                            spu_id,
                                                            sale_attr_value_id,
                                                            sale_attr_id,
                                                            sale_attr_name,
                                                            sale_attr_value_name
                                                        from sku_sale_attr_value
                                                        where 1=1"
}

case $1 in
  "order_info")
     import_order_info
;;
  "base_category1")
     import_base_category1
;;
  "base_category2")
     import_base_category2
;;
  "base_category3")
     import_base_category3
;;
  "order_detail")
     import_order_detail
;;
  "sku_info")
     import_sku_info
;;
 "user_info")
     import_user_info
;;
  "payment_info")
     import_payment_info
;;
  "base_province")
     import_base_province
;;
  "activity_info")
      import_activity_info
;;
  "cart_info")
      import_cart_info
;;
  "comment_info")
      import_comment_info
;;
  "coupon_info")
      import_coupon_info
;;
  "coupon_use")
      import_coupon_use
;;
  "favor_info")
      import_favor_info
;;
  "order_refund_info")
      import_order_refund_info
;;
  "order_status_log")
      import_order_status_log
;;
  "spu_info")
      import_spu_info
;;
  "activity_rule")
      import_activity_rule
;;
  "base_dic")
      import_base_dic
;;
  "order_detail_activity")
      import_order_detail_activity
;;
  "order_detail_coupon")
      import_order_detail_coupon
;;
  "refund_payment")
      import_refund_payment
;;
  "sku_attr_value")
      import_sku_attr_value
;;
  "sku_sale_attr_value")
      import_sku_sale_attr_value
;;
"all")
 import_base_category1
   import_base_category2
   import_base_category3
   import_order_info
   import_order_detail
   import_sku_info
   import_user_info
   import_payment_info
   import_base_trademark
   import_activity_info
   import_cart_info
   import_comment_info
   import_coupon_use
   import_coupon_info
   import_favor_info
   import_order_refund_info
   import_order_status_log
   import_spu_info
   import_activity_rule
   import_base_dic
   import_order_detail_activity
   import_order_detail_coupon
   import_refund_payment
   import_sku_attr_value
   import_sku_sale_attr_value
;;
esac
[xiaobai@hadoop102 bin]$ vim mysql_to_hdfs.sh
  1. MySQL for the above two scripts_ to_ hdfs_ init.sh,mysql_to_hdfs.sh add execution permission:
[xiaobai@hadoop102 bin]$ chmod +x mysql_to_hdfs*

  1. Initial import script:
[xiaobai@hadoop102 bin]$ ./mysql_to_hdfs_init.sh all 2020-06-14

The output of the following indicates that the command execution is completed!

After the above command is executed, as shown in the figure, open http://hadoop102:9870/ You can view the data synchronized to hdfs:

  1. Daily import script (for subsequent additions or modifications):
[xiaobai@hadoop102 bin]$ ./mysql_to_hdfs.sh all 2020-06-15

notes ⚠️:

[- n variable value] judge whether the value of the variable is empty:

– the value of the variable, which is not empty and returns true;
– the value of the variable, which is null and returns false.

  1. Check the use of the date command[ xiaobai@hadoop102 ~]$ date --help

do_date=\`date -d -1 day' +%F\`	

=Later commands ➕`` The function of is to assign the result of = later date - D - 1 day '+% F to the previous variable do_date!

if it were not for ➕`` , The command after = will be assigned as a string and directly to the previous variable do_date is not the result we want.

2.5.4 project experience

Nulls in Hive are stored in "\ N" at the bottom, while nulls in MySQL are nulls at the bottom. In order to ensure the consistency of data at both ends. When exporting data, two parameters – input Null string and – input Null non string are used. When importing data, use – Null string and – Null non string.

3, Data environment preparation

3.1 Hive installation and deployment

  1. As shown in the figure, upload apache-hive-3.1.2-bin.tar.gz to the / opt/software directory of Linux:
  2. Unzip apache-hive-3.1.2-bin.tar.gz to the directory / opt/module /
[xiaobai@hadoop102 hive]$ tar -zxvf apache-hive-3.1.2-bin.tar.gz -C /opt/module/
  1. Change the name of apache-hive-3.1.2-bin.tar.gz to hive:
[xiaobai@hadoop102 module]$ mv apache-hive-3.1.2-bin/ hive
  1. Modify / etc/profile.d/my_env.sh, add environment variable:
[xiaobai@hadoop102 module]$ sudo vim /etc/profile.d/my_env.sh 

As shown in the figure, add the following:

#HIVE_HOME
export HIVE_HOME=/opt/module/hive
export PATH=$PATH:$HIVE_HOME/bin!

  1. source click / etc/profile.d/my_env.sh file to make environment variables effective:
[xiaobai@hadoop102 module]$ source /etc/profile.d/my_env.sh

3.2 configure hive original data to MySQL

3.2.1 copy driver

Copy the JDBC driver of MySQL to the lib directory of Hive:

[xiaobai@hadoop102 module]$ cp /opt/software/mysql/mysql-connector-java-5.1.27-bin.jar /opt/module/hive/lib/

3.2.2 configuring Metastore to MySQL

Create a new hive-site.xml file in the / opt/module/hive/conf / Directory:

[xiaobai@hadoop102 conf]$ vim hive-site.xml

Add the following:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://hadoop102:3306/metastore?useSSL=false</value>
    </property>

    <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
    </property>

    <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>root</value>
    </property>

    <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>******</value>
    </property>

    <property>
        <name>hive.metastore.warehouse.dir</name>
        <value>/user/hive/warehouse</value>
    </property>

    <property>
        <name>hive.metastore.schema.verification</name>
        <value>false</value>
    </property>

    <property>
    <name>hive.server2.thrift.port</name>
    <value>10000</value>
    </property>

    <property>
        <name>hive.server2.thrift.bind.host</name>
        <value>hadoop102</value>
    </property>

    <property>
        <name>hive.metastore.event.db.notification.api.auth</name>
        <value>false</value>
    </property>
    
    <property>
        <name>hive.cli.print.header</name>
        <value>true</value>
    </property>

    <property>
        <name>hive.cli.print.current.db</name>
        <value>true</value>
    </property>
</configuration>

3.3 start Hive

3.3.1 initialize metabase

  1. Log in to MySQL:
[xiaobai@hadoop102 conf]$ mysql -uroot -p
Enter password: 
  1. Create a Hive metabase:
mysql> create database metastore;
Query OK, 1 row affected (0.03 sec)
  1. Initialize Hive metabase:
[xiaobai@hadoop102 conf]$ schematool -initSchema -dbType mysql -verbose

The following message indicates that initialization is complete!

0: jdbc:mysql://hadoop102:3306/metastore> !closeall
Closing: 0: jdbc:mysql://hadoop102:3306/metastore?useSSL=false
beeline> 
beeline> Initialization script completed
schemaTool completed

3.3.2 start Hive client

  1. Start Hive client:
[xiaobai@hadoop102 conf]$ hive
  1. Check the database:
hive (default)> show databases;
OK
database_name
default
Time taken: 2.399 seconds, Fetched: 1 row(s)

Posted by BANDYCANDY on Mon, 04 Oct 2021 15:39:02 -0700