Actual combat of e-commerce offline warehouse project

Keywords: Hadoop hive kafka Spark Data Warehouse

Actual combat of e-commerce offline warehouse project (Part 2)

E-commerce analysis - core transactions

1, Business requirements

Select indicators: order quantity, commodity quantity and payment amount, and analyze these indicators by sales region and commodity type.

2, Business database table structure

1. Relationship between database tables

2. Business database - data source
  • trade_orders
  • order_product table
  • product_info
  • product_category
  • Shop list (shops)
  • Regional organization table (shop_admin_org)
  • Payment forms
3. Database table structure design
3.1 transaction order form
DROP TABLE IF EXISTS lagou_trade_orders;
CREATE TABLE `lagou_trade_orders` (
	`orderId` bigint(11) NOT NULL AUTO_INCREMENT COMMENT 'order id',
	`orderNo` varchar(20) NOT NULL COMMENT 'Order No',
	`userId` bigint(11) NOT NULL COMMENT 'user id',
	`status` tinyint(4) NOT NULL DEFAULT '-2' COMMENT 'Order status -3:User Reject  -2:Unpaid orders -1: User canceled 0:To be shipped 1:In delivery 2:User confirms receipt',
	`productMoney` decimal(11, 2) NOT NULL COMMENT 'Commodity amount',
	`totalMoney` decimal(11, 2) NOT NULL COMMENT 'Order amount (including freight)',
	`payMethod` tinyint(4) NOT NULL DEFAULT '0' COMMENT 'Payment method,0:unknown;1:Alipay, 2: WeChat;3,Cash; 4. Other',
	`isPay` tinyint(4) NOT NULL DEFAULT '0' COMMENT 'Pay 0:Unpaid 1:Paid',
	`areaId` int(11) NOT NULL COMMENT 'Regional lowest level',
	`tradeSrc` tinyint(4) NOT NULL DEFAULT '0' COMMENT 'Order source 0:Mall 1:Wechat 2:Mobile version 3:Android App 4:Apple App',
	`tradeType` int(11) DEFAULT '0' COMMENT 'Order type',
	`isRefund` tinyint(4) NOT NULL DEFAULT '0' COMMENT 'Refund 0:No 1: Yes',
	`dataFlag` tinyint(4) NOT NULL DEFAULT '1' COMMENT 'Order validity flag -1: Delete 1:Effective',
	`createTime` varchar(25) NOT NULL COMMENT 'Order time ',
	`payTime` varchar(25) DEFAULT NULL COMMENT 'Payment time',
	`modifiedTime` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP  COMMENT 'Order update time',
	PRIMARY KEY (`orderId`)
) ENGINE = InnoDB AUTO_INCREMENT = 355 CHARSET = utf8;

remarks:

Record order information

Status: order status

createTime,payTime,modifiedTime. Creation time, payment time and modification time

3.2 order product table
DROP TABLE IF EXISTS lagou_order_produc;
CREATE TABLE `lagou_order_product` (
	`id` bigint(11) NOT NULL AUTO_INCREMENT,
	`orderId` bigint(11) NOT NULL COMMENT 'order id',
	`productId` bigint(11) NOT NULL COMMENT 'commodity id',
	`productNum` bigint(11) NOT NULL DEFAULT '0' COMMENT 'Quantity of goods',
	`productPrice` decimal(11, 2) NOT NULL DEFAULT '0.00' COMMENT 'commodity price',
	`money` decimal(11, 2) DEFAULT '0.00' COMMENT 'Payment amount',
	`extra` text COMMENT 'Additional information',
	`createTime` varchar(25) DEFAULT NULL COMMENT 'Creation time',
	PRIMARY KEY (`id`),
	KEY `orderId` (`orderId`),
	KEY `goodsId` (`productId`)
) ENGINE = InnoDB AUTO_INCREMENT = 1260 CHARSET = utf8;

remarks:

Record the information of purchased products in the order, including product quantity, unit price, etc

3.3 product information table
DROP TABLE IF EXISTS lagou_product_info;
CREATE TABLE `lagou_product_info` (    
    `productId` bigint(11) NOT NULL AUTO_INCREMENT COMMENT 'commodity id',    
    `productName` varchar(200) NOT NULL COMMENT 'Trade name',    
    `shopId` bigint(11) NOT NULL COMMENT 'store ID',    
    `price` decimal(11, 2) NOT NULL DEFAULT '0.00' COMMENT 'Store price',    
    `isSale` tinyint(4) NOT NULL DEFAULT '1' COMMENT 'Whether to put on the shelf 0:No shelf 1:Put on the shelf',    
    `status` tinyint(4) NOT NULL DEFAULT '0' COMMENT 'New product 0:No 1:yes',    
    `categoryId` int(11) NOT NULL COMMENT 'goodsCatId Last level commodity classification ID',    
    `createTime` varchar(25) NOT NULL,    
    `modifyTime` datetime DEFAULT NULL ON UPDATE CURRENT_TIMESTAMP COMMENT 'Modification time',    
    PRIMARY KEY (`productId`),    
    KEY `shopId` USING BTREE (`shopId`),    
    KEY `goodsStatus` (`isSale`)
) ENGINE = InnoDB AUTO_INCREMENT = 115909 CHARSET = utf8;

remarks:

Record the detailed information of the product, corresponding to the merchant ID and commodity attribute (whether it is new or on the shelf)

3.4 product classification table
DROP TABLE IF EXISTS lagou_product_category;
CREATE TABLE `lagou_product_category` (
	`catId` int(11) NOT NULL AUTO_INCREMENT COMMENT 'category ID',
	`parentId` int(11) NOT NULL COMMENT 'father ID',
	`catName` varchar(20) NOT NULL COMMENT 'Classification name',
	`isShow` tinyint(4) NOT NULL DEFAULT '1' COMMENT 'Show 0:Hide 1:display',
	`sortNum` int(11) NOT NULL DEFAULT '0' COMMENT 'Sort number',
	`isDel` tinyint(4) NOT NULL DEFAULT '1' COMMENT 'Delete flag 1:Effective -1: delete',
	`createTime` varchar(25) NOT NULL COMMENT 'Establishment time',
	`level` tinyint(4) DEFAULT '0' COMMENT 'Classification level of 3',
	PRIMARY KEY (`catId`),
	KEY `parentId` (`parentId`, `isShow`, `isDel`)
) ENGINE = InnoDB AUTO_INCREMENT = 10442 CHARSET = utf8;

remarks:

The product classification table is divided into three levels

-- First level product catalog
select catName, catid from lagou_product_category where level =1;
-- View the subclass of computer and office (view the secondary directory)
select catName, catid from lagou_product_category where level =2 and parentId = 32;
-- View the subclass of the complete computer (view the three-level directory)
select catName, catid from lagou_product_category where level =3 and parentId = 10250;
3.5 list of stores
DROP TABLE IF EXISTS lagou_shops;
CREATE TABLE `lagou_shops` (
	`shopId` int(11) NOT NULL AUTO_INCREMENT COMMENT 'shops ID,Self increasing',
	`userId` int(11) NOT NULL COMMENT 'Shop contact ID',
	`areaId` int(11) DEFAULT '0',
	`shopName` varchar(100) DEFAULT '' COMMENT 'Shop name',
	`shopLevel` tinyint(4) NOT NULL DEFAULT '1' COMMENT 'Store level',
	`status` tinyint(4) NOT NULL DEFAULT '1' COMMENT 'Shop status',
	`createTime` date DEFAULT NULL,
	`modifyTime` datetime DEFAULT NULL COMMENT 'Modification time',
	PRIMARY KEY (`shopId`),
	KEY `shopStatus` (`status`)
) ENGINE = InnoDB AUTO_INCREMENT = 105317 CHARSET = utf8;

remarks:

Record store details

3.6 regional organization table
DROP TABLE IF EXISTS lagou_shops;
CREATE TABLE `lagou_shop_admin_org` (
	`id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'organization ID',
	`parentId` int(11) NOT NULL COMMENT 'father ID',
	`orgName` varchar(100) NOT NULL COMMENT 'Organization name',
	`orgLevel` tinyint(4) NOT NULL DEFAULT '1' COMMENT 'Organization level 1;Headquarters and regional departments;2: All departments and basic departments under the headquarters;3:Specific work department',
	`isDelete` tinyint(4) NOT NULL DEFAULT '0' COMMENT 'Delete flag,1:delete;0:Effective',
	`createTime` varchar(25) DEFAULT NULL COMMENT 'Creation time',
	`updateTime` varchar(25) DEFAULT NULL COMMENT 'Last modification time',
	`isShow` tinyint(4) NOT NULL DEFAULT '1' COMMENT 'Show,0: Yes 1:no',
	`orgType` tinyint(4) NOT NULL DEFAULT '1' COMMENT 'Organization type,0:President's Office;1:research and development;2:sale;3:operate;4:product',
	PRIMARY KEY (`id`),
	KEY `parentId` (`parentId`)
) ENGINE = InnoDB AUTO_INCREMENT = 100332 CHARSET = utf8;

remarks:

Record the area of the store

3.7 payment method table
DROP TABLE IF EXISTS lagou_payments;
CREATE TABLE `lagou_payments` (
	`id` int(11) NOT NULL,
	`payMethod` varchar(20) DEFAULT NULL,
	`payName` varchar(255) DEFAULT NULL,
	`description` varchar(255) DEFAULT NULL,
	`payOrder` int(11) DEFAULT '0',
	`online` tinyint(4) DEFAULT NULL,
	PRIMARY KEY (`id`),
	KEY `payCode` (`payMethod`)
) ENGINE = InnoDB CHARSET = utf8;

remarks:

Record payment method

3, Data import

MYSQL export:

  • Total quantity export
  • Incremental export (export data of the previous day)

The business data is saved in MySQL, and the table data of the previous day is imported every morning.

  • The table has a small amount of data, and the full volume method is adopted to export MySQL

  • The table has a large amount of data, and can distinguish the new data every day according to the fields, and export MySQL in an incremental way.

  • Three incremental tables

    • Order form_ trade_ orders
    • Order product table_ order_ product
    • Product information sheet_ product_ info
  • Four full scale

    • Product classification table_ product_ category
    • Merchant shop table_ shops
    • Regional organization table of merchants_ shop_ admin_ org
    • Payment method table_ payment
3.1 full data import

MYSQL => HDFS => Hive

Load the full amount of data every day to form a new partition; (it can guide how to create ODS tables)

MySQLReader =====> HdfsReader

ebiz.lagou_product_category ===> ods.ods_trade_product_category

3.1.1 product classification table

When using DataX export, you need to write json files.

​ /root/data/lagoudw/json/product_category.json

{
	"job": {
		"setting": {
			"speed": {
				"channel": 1
			}
		},
		"content": [{
			"reader": {
				"name": "mysqlreader",
				"parameter": {
					"username": "root",
					"password": "12345678",
					"column": [
						"catId",
						"parentId",
						"catName",
						"isShow",
						"sortNum",
						"isDel",
						"createTime",
						"level"
					],
					"connection": [{
						"table": [
							"lagou_product_category"
						],
						"jdbcUrl": [
							"jdbc:mysql://linux123:3306/ebiz"
						]
					}]
				}
			},
			"writer": {
				"name": "hdfswriter",
				"parameter": {
					"defaultFS": "hdfs://linux121:9000",
					"fileType": "text",
					"path": "/user/data/trade.db/product_category/dt=$do_date",
					"fileName": "product_category_$do_date",
					"column": [{
							"name": "catId",
							"type": "INT"
						},
						{
							"name": "parentId",
							"type": "INT"
						},
						{
							"name": "catName",
							"type": "STRING"
						},
						{
							"name": "isShow",
							"type": "TINYINT"
						},
						{
							"name": "sortNum",
							"type": "INT"
						},
						{
							"name": "isDel",
							"type": "TINYINT"
						},
						{
							"name": "createTime",
							"type": "STRING"
						},
						{
							"name": "level",
							"type": "TINYINT"
						}
					],
					"writeMode": "append",
					"fieldDelimiter": ","
				}
			}
		}]
	}
}

remarks:

  • It is not necessary to use multiple channels for tables with small data volume; Using multiple channels generates multiple small files

  • Before executing the command, create the corresponding directory on HDFS: / user/data/trade.db/product_category/dt=yyyy-mm-dd

  • DATAX is executed on the server on which it is installed

    do_date='2020-07-01'
    # Create directory
    hdfs dfs -mkdir -p /user/data/trade.db/product_category/dt=$do_date
    # data migration 
    python $DATAX_HOME/bin/datax.py -p "-Ddo_date=$do_date" /root/data/lagoudw/json/product_category.json
    # Load data
    hive -e "alter table ods.ods_trade_product_category add partition(dt='$do_date')"
    
3.1.2 list of stores

​ lagou_shops =====> ods.ods_trade_shop

​ /root/data/lagoudw/json/shops.json

{
	"job": {
		"setting": {
			"speed": {
				"channel": 1
			},
			"errorLimit": {
				"record": 0
			}
		},
		"content": [{
			"reader": {
				"name": "mysqlreader",
				"parameter": {
					"username": "hive",
					"password": "12345678",
					"column": [
						"shopId",
						"userId",
						"areaId",
						"shopName",
						"shopLevel",
						"status",
						"createTime",
						"modifyTime"
					],
					"connection": [{
						"table": [
							"lagou_shops"
						],
						"jdbcUrl": [
							"jdbc:mysql://linux123:3306/ebiz"
						]
					}]
				}
			},
			"writer": {
				"name": "hdfswriter",
				"parameter": {
					"defaultFS": "hdfs://linux121:9000",
					"fileType": "text",
					"path": "/user/data/trade.db/shops/dt=$do_date",
					"fileName": "shops_$do_date",
					"column": [{
							"name": "shopId",
							"type": "INT"
						},
						{
							"name": "userId",
							"type": "INT"
						},
						{
							"name": "areaId",
							"type": "INT"
						},
						{
							"name": "shopName",
							"type": "STRING"
						},
						{
							"name": "shopLevel",
							"type": "TINYINT"
						},
						{
							"name": "status",
							"type": "TINYINT"
						},
						{
							"name": "createTime",
							"type": "STRING"
						},
						{
							"name": "modifyTime",
							"type": "STRING"
						}
					],
					"writeMode": "append",
					"fieldDelimiter": ","
				}
			}
		}]
	}
}
do_date = '2020-07-01'
# Create directory
hdfs dfs -mkdir -p /user/data/trade.db/shops/dt=$do_date
# data migration 
python $DATAX_HOME/bin/datax.py -p "-Ddo_date=$do_date" /root/data/lagoudw/json/shops.json
# Load data
hive -e "alter table ods.ods_trade_shops add partition(dt='$do_date')"
3.1.3 regional organization table of merchants

​ lagou_shop_admin_org =====> ods.ods_trade_shop_admin_org

​ /root/data/lagoudw/json/shop_org.json

{
	"job": {
		"setting": {
			"speed": {
				"channel": 1
			},
			"errorLimit": {
				"record": 0
			}
		},
		"content": [{
			"reader": {
				"name": "mysqlreader",
				"parameter": {
					"username": "root",
					"password": "12345678",
					"column": [
						"id",
						"parentId",
						"orgName",
						"orgLevel",
						"isDelete",
						"createTime",
						"updateTime",
						"isShow",
						"orgType"
					],
					"connection": [{
						"table": [
							"lagou_shop_admin_org"
						],
						"jdbcUrl": [
							"jdbc:mysql://linux123:3306/ebiz"
						]
					}]
				}
			},
			"writer": {
				"name": "hdfswriter",
				"parameter": {
					"defaultFS": "hdfs://linux121:9000",
					"fileType": "text",
					"path": "/user/data/trade.db/shop_org/dt=$do_date",
					"fileName": "shop_admin_org_$do_date.dat",
					"column": [{
							"name": "id",
							"type": "INT"
						},
						{
							"name": "parentId",
							"type": "INT"
						},
						{
							"name": "orgName",
							"type": "STRING"
						},
						{
							"name": "orgLevel",
							"type": "TINYINT"
						},
						{
							"name": "isDelete",
							"type": "TINYINT"
						},
						{
							"name": "createTime",
							"type": "STRING"
						},
						{
							"name": "updateTime",
							"type": "STRING"
						},
						{
							"name": "isShow",
							"type": "TINYINT"
						},
						{
							"name": "orgType",
							"type": "TINYINT"
						}
					],
					"writeMode": "append",
					"fieldDelimiter": ","
				}
			}
		}]
	}
}
do_date='2020-07-01'
# Create directory
hdfs dfs -mkdir -p /user/data/trade.db/shop_org/dt=$do_date
# data migration 
python $DATAX_HOME/bin/datax.py -p "-Ddo_date=$do_date" /root/data/lagoudw/json/shop_org.json
# Load data
hive -e "alter table ods.ods_trade_shop_admin_org add partition(dt='$do_date')"
3.1.4 payment method table

​ lagou_payements ====> ods.ods_trade_payments

​ /root/data/lagoudw/json/payment.json

{
	"job": {
		"setting": {
			"speed": {
				"channel": 1
			},
			"errorLimit": {
				"record": 0
			}
		},
		"content": [{
			"reader": {
				"name": "mysqlreader",
				"parameter": {
					"username": "root",
					"password": "12345678",
					"column": [
						"id",
						"payMethod",
						"payName",
						"description",
						"payOrder",
						"online"
					],
					"connection": [{
						"table": [
							"lagou_payments"
						],
						"jdbcUrl": [
							"jdbc:mysql://linux123:3306/ebiz"
						]
					}]
				}
			},
			"writer": {
				"name": "hdfswriter",
				"parameter": {
					"defaultFS": "hdfs://linux121:9000",
					"fileType": "text",
					"path": "/user/data/trade.db/trade_payments/dt=$do_date",
					"fileName": "payments_$do_date.dat",
					"column": [{
							"name": "id",
							"type": "INT"
						},
						{
							"name": "payMethod",
							"type": "STRING"
						},
						{
							"name": "payName",
							"type": "STRING"
						},
						{
							"name": "description",
							"type": "STRING"
						},
						{
							"name": "payOrder",
							"type": "INT"
						},
						{
							"name": "online",
							"type": "TINYINT"
						}
					],
					"writeMode": "append",
					"fieldDelimiter": ","
				}
			}
		}]
	}
}
do_date='2020-07-01'
# Create directory
hdfs dfs -mkdir -p /user/data/trade.db/trade_payments/dt=$do_date
# data migration 
python $DATAX_HOME/bin/datax.py -p "-Ddo_date=$do_date" /root/data/lagoudw/json/payment.json
# Load data
hive -e "alter table ods.ods_trade_payments add partition(dt='$do_date')"
3.2 incremental data import

Initialize data loading (only once); The previous full load can be used as the initial load

Load incremental data every day (daily data forms a partition)

3.2.1 order form

​ lagou_trade_orders =====> ods.ods_trade_orders

​ /root/data/lagoudw/json/orders.json

Note: for the selection of conditions, select the time period modiriedTime

{
	"job": {
		"setting": {
			"speed": {
				"channel": 1
			},
			"errorLimit": {
				"record": 0
			}
		},
		"content": [{
			"reader": {
				"name": "mysqlreader",
				"parameter": {
					"username": "root",
					"password": "12345678",
					"connection": [{
						"querySql": [
							"select orderId, orderNo, userId,status, productMoney, totalMoney, payMethod, isPay, areaId,tradeSrc, tradeType, isRefund, dataFlag, createTime, payTime,modifiedTime from lagou_trade_orders where date_format(modifiedTime, '%Y-%m-%d')='$do_date'"
						],
						"jdbcUrl": [
							"jdbc:mysql://linux123:3306/ebiz"
						]
					}]
				}
			},
			"writer": {
				"name": "hdfswriter",
				"parameter": {
					"defaultFS": "hdfs://linux121:9000",
					"fileType": "text",
					"path": "/user/data/trade.db/orders/dt=$do_date",
					"fileName": "orders_$do_date",
					"column": [{
							"name": "orderId",
							"type": "INT"
						},
						{
							"name": "orderNo",
							"type": "STRING"
						},
						{
							"name": "userId",
							"type": "BIGINT"
						},
						{
							"name": "status",
							"type": "TINYINT"
						},
						{
							"name": "productMoney",
							"type": "Float"
						},
						{
							"name": "totalMoney",
							"type": "Float"
						},
						{
							"name": "payMethod",
							"type": "TINYINT"
						},
						{
							"name": "isPay",
							"type": "TINYINT"
						},
						{
							"name": "areaId",
							"type": "INT"
						},
						{
							"name": "tradeSrc",
							"type": "TINYINT"
						},
						{
							"name": "tradeType",
							"type": "INT"
						},
						{
							"name": "isRefund",
							"type": "TINYINT"
						},
						{
							"name": "dataFlag",
							"type": "TINYINT"
						},
						{
							"name": "createTime",
							"type": "STRING"
						},
						{
							"name": "payTime",
							"type": "STRING"
						},
						{
							"name": "modifiedTime",
							"type": "STRING"
						}
					],
					"writeMode": "append",
					"fieldDelimiter": ","
				}
			}
		}]
	}
}
-- MySQL Time date conversion in
select date_format(createTime, '%Y-%m-%d'), count(*)
from lagou_trade_orders
group by date_format(createTime, '%Y-%m-%d');
do_date='2020-07-12'
# Create directory
hdfs dfs -mkdir -p /user/data/trade.db/orders/dt=$do_date
# data migration 
python $DATAX_HOME/bin/datax.py -p "-Ddo_date=$do_date" /root/data/lagoudw/json/orders.json
# Load data
hive -e "alter table ods.ods_trade_orders add partition(dt='$do_date')"
3.2.2 order details

​ lagou_order_product ====> ods.ods_trade_order_product

​ /root/data/lagoudw/json/order_product.json

{
	"job": {
		"setting": {
			"speed": {
				"channel": 1
			},
			"errorLimit": {
				"record": 0
			}
		},
		"content": [{
			"reader": {
				"name": "mysqlreader",
				"parameter": {
					"username": "root",
					"password": "12345678",
					"connection": [{
						"querySql": [
							"select id, orderId, productId,productNum, productPrice, money, extra, createTime from lagou_order_product where date_format(createTime, '%Y-%m-%d')= '$do_date' "
						],
						"jdbcUrl": [
							"jdbc:mysql://linux123:3306/ebiz"
						]
					}]
				}
			},
			"writer": {
				"name": "hdfswriter",
				"parameter": {
					"defaultFS": "hdfs://linux121:9000",
					"fileType": "text",
					"path": "/user/data/trade.db/order_product/dt=$do_date",
					"fileName": "order_product_$do_date.dat",
					"column": [{
							"name": "id",
							"type": "INT"
						},
						{
							"name": "orderId",
							"type": "INT"
						},
						{
							"name": "productId",
							"type": "INT"
						},
						{
							"name": "productNum",
							"type": "INT"
						},
						{
							"name": "productPrice",
							"type": "Float"
						},
						{
							"name": "money",
							"type": "Float"
						},
						{
							"name": "extra",
							"type": "STRING"
						},
						{
							"name": "createTime",
							"type": "STRING"
						}
					],
					"writeMode": "append",
					"fieldDelimiter": ","
				}
			}
		}]
	}
}
do_date='2020-07-12'
# Create directory
hdfs dfs -mkdir -p /user/data/trade.db/order_product/dt=$do_date
# data migration 
python $DATAX_HOME/bin/datax.py -p "-Ddo_date=$do_date" /root/data/lagoudw/json/order_product.json
# Load data
hive -e "alter table ods.ods_trade_order_product add partition(dt='$do_date')"
3.2.3 product information table

​ lagou_product_info =====> ods.ods_trade_product_info

​ /root/data/lagoudw/json/product_info.json

{
	"job": {
		"setting": {
			"speed": {
				"channel": 1
			},
			"errorLimit": {
				"record": 0
			}
		},
		"content": [{
			"reader": {
				"name": "mysqlreader",
				"parameter": {
					"username": "root",
					"password": "12345678",
					"connection": [{
						"querySql": [
							"select productid, productname,shopid, price, issale, status, categoryid, createtime,modifytime from lagou_product_info where date_format(modifyTime, '%Y-%m-%d') = '$do_date' "
						],
						"jdbcUrl": [
							"jdbc:mysql://linux123:3306/ebiz"
						]
					}]
				}
			},
			"writer": {
				"name": "hdfswriter",
				"parameter": {
					"defaultFS": "hdfs://linux121:9000",
					"fileType": "text",
					"path": "/user/data/trade.db/product_info/dt=$do_date",
					"fileName": "product_info_$do_date.dat",
					"column": [{
							"name": "productid",
							"type": "BIGINT"
						},
						{
							"name": "productname",
							"type": "STRING"
						},
						{
							"name": "shopid",
							"type": "STRING"
						},
						{
							"name": "price",
							"type": "FLOAT"
						},
						{
							"name": "issale",
							"type": "TINYINT"
						},
						{
							"name": "status",
							"type": "TINYINT"
						},
						{
							"name": "categoryid",
							"type": "STRING"
						},
						{
							"name": "createTime",
							"type": "STRING"
						},
						{
							"name": "modifytime",
							"type": "STRING"
						}
					],
					"writeMode": "append",
					"fieldDelimiter": ","
				}
			}
		}]
	}
}
do_date='2020-07-12'
# Create directory
hdfs dfs -mkdir -p /user/data/trade.db/product_info/dt=$do_date
# data migration 
python $DATAX_HOME/bin/datax.py -p "-Ddo_date=$do_date" /root/data/lagoudw/json/product_info.json
# Load data
hive -e "alter table ods.ods_trade_product_info add partition(dt='$do_date')"

4, ODS layer building table and data loading

ODS floor construction table

  • The ODS layered table structure is basically similar to the data source (column name and data type)
  • The table names of ODS layer follow the unified specification
4.1 ODS floor construction table

All tables are partitioned tables; The separator between fields is', '; A location is specified for the table's data file

DROP TABLE IF EXISTS `ods.ods_trade_orders`;
CREATE EXTERNAL TABLE `ods.ods_trade_orders`(
`orderid` int,
`orderno` string,
`userid` bigint,
`status` tinyint,
`productmoney` decimal(10, 0),
`totalmoney` decimal(10, 0),
`paymethod` tinyint,
`ispay` tinyint,
`areaid` int,
`tradesrc` tinyint,
`tradetype` int,
`isrefund` tinyint,
`dataflag` tinyint,
`createtime` string,
`paytime` string,
`modifiedtime` string)
COMMENT 'Order form'
PARTITIONED BY (`dt` string)
row format delimited fields terminated by ','
location '/user/data/trade.db/orders/';


DROP TABLE IF EXISTS `ods.ods_trade_order_product`;
CREATE EXTERNAL TABLE `ods.ods_trade_order_product`(
`id` string,
`orderid` decimal(10,2),
`productid` string,
`productnum` string,
`productprice` string,
`money` string,
`extra` string,
`createtime` string)
COMMENT 'Order Details '
PARTITIONED BY (`dt` string)
row format delimited fields terminated by ','
location '/user/data/trade.db/order_product/';


DROP TABLE IF EXISTS `ods.ods_trade_product_info`;
CREATE EXTERNAL TABLE `ods.ods_trade_product_info`(
`productid` bigint,
`productname` string,
`shopid` string,
`price` decimal(10,0),
`issale` tinyint,
`status` tinyint,
`categoryid` string,
`createtime` string,
`modifytime` string)
COMMENT 'Product information sheet'
PARTITIONED BY (`dt` string)
row format delimited fields terminated by ','
location '/user/data/trade.db/product_info/';


DROP TABLE IF EXISTS `ods.ods_trade_product_category`;
CREATE EXTERNAL TABLE `ods.ods_trade_product_category`(
`catid` int,
`parentid` int,
`catname` string,
`isshow` tinyint,
`sortnum` int,
`isdel` tinyint,
`createtime` string,
`level` tinyint)
COMMENT 'Product classification table'
PARTITIONED BY (`dt` string)
row format delimited fields terminated by ','
location '/user/data/trade.db/product_category';


DROP TABLE IF EXISTS `ods.ods_trade_shops`;
CREATE EXTERNAL TABLE `ods.ods_trade_shops`(
`shopid` int,
`userid` int,
`areaid` int,
`shopname` string,
`shoplevel` tinyint,
`status` tinyint,
`createtime` string,
`modifytime` string)
COMMENT 'Merchant store list'
PARTITIONED BY (`dt` string)
row format delimited fields terminated by ','
location '/user/data/trade.db/shops';

DROP TABLE IF EXISTS `ods.ods_trade_shop_admin_org`;
CREATE EXTERNAL TABLE `ods.ods_trade_shop_admin_org`(
`id` int,
`parentid` int,
`orgname` string,
`orglevel` tinyint,
`isdelete` tinyint,
`createtime` string,
`updatetime` string,
`isshow` tinyint,
`orgType` tinyint)
COMMENT 'Merchant regional organization table'
PARTITIONED BY (`dt` string)
row format delimited fields terminated by ','
location '/user/data/trade.db/shop_org/';

DROP TABLE IF EXISTS `ods.ods_trade_payments`;
CREATE EXTERNAL TABLE `ods.ods_trade_payments`(
`id` string,
`paymethod` string,
`payname` string,
`description` string,
`payorder` int,
`online` tinyint)
COMMENT 'Payment method table'
PARTITIONED BY (`dt` string)
row format delimited fields terminated by ','
location '/user/data/trade.db/payments/';
4.2 ODS layer data loading

DataX just imports the data into HDFS, and the data is not associated with Hive table.

Script tasks: data migration, data loading to ODS layer;

For incremental data loading: for initial data loading, the task is executed only once and is not in the script.

​ /root/data/lagoudw/script/core_trade/ods_load_trade.sh

#!/bin/bash
source /etc/profile
if [ -n "$1" ] ;then
do_date=$1
else
do_date=`date -d "-1 day" +%F`
fi
# Create directory
hdfs dfs -mkdir -p /user/data/trade.db/product_category/dt=$do_date
hdfs dfs -mkdir -p /user/data/trade.db/shops/dt=$do_date
hdfs dfs -mkdir -p /user/data/trade.db/shop_org/dt=$do_date
hdfs dfs -mkdir -p /user/data/trade.db/payments/dt=$do_date
hdfs dfs -mkdir -p /user/data/trade.db/orders/dt=$do_date
hdfs dfs -mkdir -p /user/data/trade.db/order_product/dt=$do_date
hdfs dfs -mkdir -p /user/data/trade.db/product_info/dt=$do_date
# data migration 
python $DATAX_HOME/bin/datax.py -p "-Ddo_date=$do_date" /root/data/lagoudw/json/product_category.json
python $DATAX_HOME/bin/datax.py -p "-Ddo_date=$do_date" /root/data/lagoudw/json/shops.json
python $DATAX_HOME/bin/datax.py -p "-Ddo_date=$do_date" /root/data/lagoudw/json/shop_org.json
python $DATAX_HOME/bin/datax.py -p "-Ddo_date=$do_date" /root/data/lagoudw/json/payments.json
python $DATAX_HOME/bin/datax.py -p "-Ddo_date=$do_date" /root/data/lagoudw/json/orders.json
python $DATAX_HOME/bin/datax.py -p "-Ddo_date=$do_date" /root/data/lagoudw/json/order_product.json
python $DATAX_HOME/bin/datax.py -p "-Ddo_date=$do_date" /root/data/lagoudw/json/product_info.json
# Loading ODS layer data
sql="
alter table ods.ods_trade_orders add partition(dt='$do_date');
alter table ods.ods_trade_order_product add partition(dt='$do_date');
alter table ods.ods_trade_product_info add partition(dt='$do_date');
alter table ods.ods_trade_product_category add partition(dt='$do_date');
alter table ods.ods_trade_shops add partition(dt='$do_date');
alter table ods.ods_trade_shop_admin_org add partition(dt='$do_date');
alter table ods.ods_trade_payments add partition(dt='$do_date');
"
hive -e "$sql"

characteristic:

Heavy workload, cumbersome and error prone; Work with data acquisition;

5, Slow change dimension and periodic fact table

5.1 slow change dimension

Slow changing dimensions (SCD): in the real world, the attributes of dimensions change slowly with the passage of time (slow is relative to the fact table, and the data in the fact table changes faster than the dimension table).

The problem of dealing with the historical change information of the dimension table is called the problem of dealing with the slow change dimension, or SCD problem for short. There are several ways to deal with slow change dimensions:

  • Keep original value (not commonly used)
  • Direct coverage (not commonly used)
  • Add new attribute column (not commonly used)
  • Snapshot table
  • Zipper Watch
5.1.1 retain original value

The dimension attribute value will not be changed and the original value will be retained.

For example, the time of goods on the shelves: after a commodity is on the shelves for sale, it is off the shelves for other reasons, and then it is on the shelves again. This situation produces multiple times of goods on the shelves for sale. This method is adopted if the business focuses on the time when the goods are first put on the shelves.

5.1.2 direct coverage

Modify the dimension attribute to the latest value and overwrite it directly without retaining historical information.

For example, the category to which the commodity belongs, when the commodity category changes, it is directly rewritten as a new category.

5.1.3 add new attribute column

Add a new column in the dimension table. The original attribute column stores the attribute value of the previous version, and the current attribute column stores the attribute value of the current version. You can also add a column to record the change time.

Disadvantages: only the last change information can be recorded

5.1.4 snapshot table

Keep a full copy of the data every day.

Simple and efficient. The disadvantage is that the information is repeated, wasting disk space

Usage range: dimension cannot be too large

There are many use scenarios and a wide range; Generally speaking, dimension tables are not large.

5.1.1 zipper table

Zipper table is suitable for: the table has a large amount of data, and the data will be added and changed, but most of them are unchanged (the percentage of data change is small) and change slowly (for example, some basic user attributes in the user information table in e-commerce cannot change every day). The main purpose is to save storage space.

Applicable scenarios:

  • The table has a large amount of data
  • Some fields in the table will be updated
  • The proportion of changes recorded in the table is not high
  • Historical information needs to be retained
5.2 application cases of dimension table

5.2.1 create table and load data (preparation)
-- User information
DROP TABLE IF EXISTS test.userinfo;
CREATE TABLE test.userinfo(
userid STRING COMMENT 'User number',
mobile STRING COMMENT 'phone number',
regdate STRING COMMENT 'Date of registration')
COMMENT 'User information'
PARTITIONED BY (dt string)
row format delimited fields terminated by ',';
-- Zipper table (storing user history information)
-- Zipper table is not a partition table; Two more fields start_date,end_date
DROP TABLE IF EXISTS test.userhis;
CREATE TABLE test.userhis(
userid STRING COMMENT 'User number',
mobile STRING COMMENT 'phone number',
regdate STRING COMMENT 'Date of registration',
start_date STRING,
end_date STRING)
COMMENT 'User information zipper table'
row format delimited fields terminated by ',';
-- data(/root/data/lagoudw/data/userinfo.dat)
001,13551111111,2020-03-01,2020-06-20
002,13561111111,2020-04-01,2020-06-20
003,13571111111,2020-05-01,2020-06-20
004,13581111111,2020-06-01,2020-06-20
002,13562222222,2020-04-01,2020-06-21
004,13582222222,2020-06-01,2020-06-21
005,13552222222,2020-06-21,2020-06-21
004,13333333333,2020-06-01,2020-06-22
005,13533333333,2020-06-21,2020-06-22
006,13733333333,2020-06-22,2020-06-22
001,13554444444,2020-03-01,2020-06-23
003,13574444444,2020-05-01,2020-06-23
005,13555554444,2020-06-21,2020-06-23
007,18600744444,2020-06-23,2020-06-23
008,18600844444,2020-06-23,2020-06-23

-- Static partition data loading (omitted)
/root/data/lagoudw/data/userinfo0620.dat
001,13551111111,2020-03-01
002,13561111111,2020-04-01
003,13571111111,2020-05-01
004,13581111111,2020-06-01
load data local inpath '/root/data/lagoudw/data/userinfo0620.dat' into table test.userinfo partition(dt='2020-06-20');

Dynamic partition data loading

-- Dynamic partition data loading: the partition value is not fixed and is determined by the input data
-- Create intermediate table(Non partitioned table)
drop table if exists test.tmp1;
create table test.tmp1 as
select * from test.userinfo;
-- tmp1 Non partitioned table, using the system default field delimiter'\001'
alter table test.tmp1 set serdeproperties('field.delim'=',');
-- Load data to intermediate table
load data local inpath '/root/data/lagoudw/data/userinfo.dat' into table test.tmp1;
-- Load data from intermediate table to partitioned table
-- Direct to userinfo The following error occurs when inserting data into the table
FAILED: SemanticException [Error 10096]: Dynamic partition strict mode requires at least one static partition column. To turn this off set hive.exec.dynamic.partition.mode=nonstrict
-- The solution is to change the partition mode to non strict mode
set hive.exec.dynamic.partition.mode=nonstrict;
insert into table test.userinfo partition(dt) select * from test.tmp1;

Parameters related to dynamic partitioning

  • hive.exec.dynamic.partition

    • Default Value: false prior to Hive 0.9.0; true in Hive 0.9.0 and later
    • Added In: Hive 0.6.0
    • Whether or not to allow dynamic partitions in DML/DDL indicates whether the dynamic partitions function is enabled
  • hive.exec.dynamic.partition.mode

    • Default Value: strict

    • Added In: Hive 0.6.0

    • In strict mode, the user must specify at least one static partition in case the user accidentally overwrites all partitions. In nonstrict mode all partitions are allowed to be dynamic.

    • Set to nonstrict to support INSERT ... VALUES, UPDATE, and DELETE transactions (Hive 0.14.0 and later).

    • strict requires at least one static partition

    • nonstrict can be all dynamic partitions

  • hive.exec.max.dynamic.partitions

    • Default Value: 1000

    • Added In: Hive 0.6.0

    • Maximum number of dynamic partitions allowed to be created in total

      Indicates the maximum number of dynamic partitions that can be created by a dynamic partition statement, exceeding which an error is reported

  • hive.exec.max.dynamic.partitions.pernode

    • Default Value: 100

    • Added In: Hive 0.6.0

    • Maximum number of dynamic partitions allowed to be created in each mapper/reducer node

      Indicates the maximum number of dynamic partitions that can be created by each mapper / reducer. The default is 100. If it exceeds 100, an error will be reported.

  • hive.exec.max.created.files

    • Default Value: 100000

    • Added In: Hive 0.7.0

    • Maximum number of HDFS files created by all mappers/reducers in a MapReduce job.

      Indicates the maximum number of files that can be created by an MR job, exceeding which an error is reported.

5.2.2 realization of zipper Watch

userinfo (partition table) = > userid, mobile, regdate = > daily changed data (modified + new) / historical data (the first day)

userhis (zipper table) = > two more fields start_date / end_date

-- step
-- 1. userinfo Initialization (2020)-06-20) Get historical data
001,13551111111,2020-03-01,2020-06-20
002,13561111111,2020-04-01,2020-06-20
003,13571111111,2020-05-01,2020-06-20
004,13581111111,2020-06-01,2020-06-20
-- 2. Initialize zipper table (2020)-06-20) userinfo => userhis
insert overwrite table test.userhis  
select userid,mobile,regdate,dt as start_date,'9999-12-31' as end_date
from test.userinfo where dt='2020-06-20'

-- 3. New data (2020)-06-21) Get new data
002,13562222222,2020-04-01,2020-06-21
004,13582222222,2020-06-01,2020-06-21
005,13552222222,2020-06-21,2020-06-21

-- 4. Build zipper table( userhis)(2020-06-21)[[core] userinfo(2020-06-21) + userhis => userhis
-- userinf New data
-- userhis historical data 

-- Step 1: process new data[ userinfo](Processing logic (similar to loading historical data)
select userid,mobile,regdate,dt as start_date,'9999-12-31' as end_date
from test.userinfo where dt='2020-06-21'

-- Step 2: process historical data[ userhis](Historical data includes two parts: changed and unchanged)
-- Changing: start_date unchanged   end_date  Incoming date -1
-- Unchanged: no treatment
-- Observation data

select A.userid, B.userid, B.mobile, B.regdate, B.start_Date,
B.end_date from
	(select * from test.userinfo where dt='2020-06-21') A 
right join test.userhis B
on A.userid=B.userid;
-- =========================================================================================
a.userid        b.userid          b.mobile        b.regdate       b.start_date    b.end_date
NULL    		001     		13551111111     2020-03-01      2020-06-20      9999-12-31
002     		002     		13561111111     2020-04-01      2020-06-20      9999-12-31
NULL    		003     		13571111111     2020-05-01      2020-06-20      9999-12-31
004     		004     		13581111111     2020-06-01      2020-06-20      9999-12-31
-- =========================================================================================
-- to write SQL,Process historical data
select B.userid, B.mobile, B.regdate, B.start_Date,
CASE WHEN B.end_date ='9999-12-31' AND A.userid is not null
then date_add('2020-06-21',-1) else B.end_date end as end_date
from (select * from test.userinfo where dt='2020-06-21') A 
right join test.userhis B
on A.userid=B.userid;
-- =========================================================================
b.userid          b.mobile        b.regdate       b.start_date    end_date
001     		13551111111     2020-03-01      2020-06-20      9999-12-31
002     		13561111111     2020-04-01      2020-06-20      2020-06-20
003     		13571111111     2020-05-01      2020-06-20      9999-12-31
004    			13581111111     2020-06-01      2020-06-20      2020-06-20
-- =========================================================================
-- Step 3: final processing (add)+(historical data)
insert overwrite table test.userhis
select userid,mobile,regdate,dt as start_date,'9999-12-31' as end_date
from test.userinfo where dt='2020-06-21'
union all
select B.userid, B.mobile, B.regdate, B.start_Date,
CASE WHEN B.end_date ='9999-12-31' AND A.userid is not null
then date_add('2020-06-21',-1) else B.end_date end as end_date
from (select * from test.userinfo where dt='2020-06-21') A 
right join test.userhis B
on A.userid=B.userid;
-- 5. New data on the third day (2020)-06-22): Get new data
004,13333333333,2020-06-01,2020-06-22
005,13533333333,2020-06-21,2020-06-22
006,13733333333,2020-06-22,2020-06-22
-- 6,Build zipper table (2020)-06-22) userinfo(2020-06-22) + userhis =>
userhis
-- 7,New data on the fourth day (2020)-06-23)
001,13554444444,2020-03-01,2020-06-23
003,13574444444,2020-05-01,2020-06-23
005,13555554444,2020-06-21,2020-06-23
007,18600744444,2020-06-23,2020-06-23
008,18600844444,2020-06-23,2020-06-23
-- 8,Build zipper table(2020-06-23)

Script for processing zipper table (test script): (/ root/data/lagoudw/data/userzipper.sh)

#!/bin/bash
source /etc/profile
if [ -n "$1" ] ;then
do_date=$1
else
do_date=`date -d "-1 day" +%F`
fi
sql="insert overwrite table test.userhis
select userid, mobile, regdate, dt as start_date, '9999-12-
31' as end_date
from test.userinfo
where dt='$do_date'
union all
select B.userid,
B.mobile,
B.regdate,
B.start_Date,
case when B.end_date='9999-12-31' and A.userid is not
null
then date_add('$do_date', -1)
else B.end_date
end as end_date
from (select * from test.userinfo where dt='$do_date') A
right join test.userhis B
on A.userid=B.userid;
"
hive -e "$sql"

Use of zipper Watch

-- View the latest data in the zipper table(2020-06-23 Future data)
select * from userhis where end_date='9999-12-31';
-- View the date data given in the zipper table("2020-06-22")
select * from userhis where start_date <= '2020-06-22' and end_date >= '2020-06-22';
-- View the date data given in the zipper table("2020-06-21")
select * from userhis where start_date <= '2020-06-21' and end_date >= '2020-06-21';
-- View the date data given in the zipper table("2020-06-20")
select * from userhis where start_date <= '2020-06-20' and end_date >= '2020-06-20';
5.2.3 rollback of zipper table

For various reasons, it is necessary to restore the zipper table to the data of rollback_date. At this time:

End_date < rollback_date, i.e. end date < rollback date. It means that the data in this row is generated before rollback_date and needs to be retained as is.

Start_date < = rollback_date < = end_date, i.e. start date < = rollback date < = end date. These data are generated after the rollback date, but need to be modified. Change the end_)date to 9999-12-31

Don't worry about other data

Code according to the above scheme:

  • Process the data with end_date < rollback_date and keep it
select userid, mobile, regdate, start_date, end_date, '1' as tag
from test.userhis
where end_date < '2020-06-22';
  • Process the data of start_date < = rollback_date < = end_date, and set end_date = 9999-12-31
select userid, mobile, regdate, start_date, '9999-12-31' as end_date, '2' as tag
from test.userhis
where start_date <= '2020-06-22' and end_date >= '2020-06-22';
  • Write the data of the previous two steps into the temporary table tmp (zipper table)
drop table test.tmp;
create table test.tmp as
select userid, mobile, regdate, start_date, end_date, '1' as tag
from test.userhis
where end_date < '2020-06-22'
union all
select userid, mobile, regdate, start_date, '9999-12-31' as end_date, '2' as tag
from test.userhis
where start_date <= '2020-06-22' and end_date >= '2020-06-22';
-- Query results
select * from test.tmp cluster by userid, start_date;
5.2.4 simulation script

​ /root/data/lagoudw/data/zippertmp.sh

#!/bin/bash
source /etc/profile
if [ -n "$1" ] ;then
do_date=$1
else
do_date=`date -d "-1 day" +%F`
fi
sql="
drop table test.tmp;
create table test.tmp as
select userid, mobile, regdate, start_date, end_date, '1' as tag
from test.userhis
where end_date < '$do_date'
union all
select userid, mobile, regdate, start_date, '9999-12-31' as end_date, '2' as tag
from test.userhis
where start_date <= '$do_date' and end_date >= '$do_date';
"
hive -e "$sql"

Rollback day by day, check data

Scheme 2: save the incremental data (userinfo) for a period of time, and back up the zipper table regularly (for example, once a month); if you need to roll back, run the incremental data directly on the zipper table of Beifen. The processing is simple

5.3 periodic fact sheet

There is the following order form, and there are three records (001 / 002 / 003) on June 20:

There were 5 records on June 21, including 2 new records (004 / 005) and 1 modified record (001)

On June 22, there were 6 records in the table, including 1 new record (006) and 2 modified records (003 / 005):

Processing method of order fact table:

  • Only one full volume is reserved. The data is the same as the record on June 22. If you need to view the status of order 001 on June 21, it cannot be met;
  • Keep a full copy every day. All historical information can be found in the data warehouse, but the amount of data is large, and many information is repeated, which will cause great storage waste;

The following table can be used to save historical information by using the zipper table. The historical zipper table can not only meet the needs of saving historical data, but also save storage resources.

5.3.1 preconditions
  • The refresh frequency of the order table is one day. When filled in, the incremental data of the previous day is obtained;
  • If the status of an order changes many times in a day, only the information of the last status is recorded;
  • Order status includes three: create, pay and complete;
  • The creation time and modification time are only days. If there is no status modification time in the source order table, it is troublesome to extract the increment. A mechanism is needed to ensure that the daily increment data can be extracted;

The ODS layer of the data warehouse has an order table, and the data is divided by day to store the daily incremental data:

DROP TABLE test.ods_orders;
CREATE TABLE test.ods_orders(
orderid INT,
createtime STRING,
modifiedtime STRING,
status STRING
) PARTITIONED BY (dt STRING)
row format delimited fields terminated by ',';

The DWD layer of the data warehouse has an order zipper table to store the historical status data of the order:

DROP TABLE test.dwd_orders;
CREATE TABLE test.dwd_orders(
orderid INT,
createtime STRING,
modifiedtime STRING,
status STRING,
start_date STRING,
end_date STRING
)
row format delimited fields terminated by ',';
5.3.2 implementation of periodic event table
  • Full initialization
-- data file order1.dat
-- /root/data/lagoudw/data/order1.dat
001,2020-06-20,2020-06-20,establish
002,2020-06-20,2020-06-20,establish
003,2020-06-20,2020-06-20,payment


load data local inpath '/root/data/lagoudw/data/order1.dat' into table test.ods_orders partition(dt='2020-06-20');
INSERT overwrite TABLE test.dwd_orders
SELECT orderid, createtime, modifiedtime, status,
createtime AS start_date,
'9999-12-31' AS end_date
FROM test.ods_orders
WHERE dt='2020-06-20';

Incremental extraction

-- data file order2.dat
001,2020-06-20,2020-06-21,payment
004,2020-06-21,2020-06-21,establish
005,2020-06-21,2020-06-21,establish

load data local inpath '/root/data/lagoudw/data/order2.dat' into table test.ods_orders partition(dt='2020-06-21');

Incremental refresh of historical data

-- The data in the zipper table is realized in two parts: new data(ods_orders),historical data (dwd_orders)
-- Processing new data
SELECT orderid,
createtime,
modifiedtime,
status,
modifiedtime AS start_date,
'9999-12-31' AS end_date
FROM test.ods_orders
where dt='2020-06-21';

-- Process historical data. Historical data includes modified and unmodified data
-- ods_orders And dwd_orders Make table connection
-- On the connection, the data is modified
-- Not connected, indicating that the data has not been modified
select A.orderid,
A.createtime,
A.modifiedtime,
A.status,
A.start_date,
case when B.orderid is not null and A.end_date>'2020-06-21'
then '2020-06-20'
else A.end_date
end end_date
from test.dwd_orders A
left join (select * from test.ods_orders where dt='2020-06-21') B
on A.orderid=B.orderid;

-- Overwrite the zipper table with the above information
insert overwrite table test.dwd_orders
SELECT orderid,
createtime,
modifiedtime,
status,
modifiedtime AS start_date,
'9999-12-31' AS end_date
FROM test.ods_orders
where dt='2020-06-21'
union all
select A.orderid,
A.createtime,
A.modifiedtime,
A.status,
A.start_date,
case when B.orderid is not null and A.end_date>'2020-06-21'
then '2020-06-20'
else A.end_date
end end_date
from test.dwd_orders A
left join (select * from test.ods_orders where dt='2020-06-21') B
on A.orderid=B.orderid;

[the external chain image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-nic3jvy6-16314986453) (C: \ users \ 21349 \ appdata \ roaming \ typora \ user images \ image-20210909151346705. PNG)]

[the external chain image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-x9j47ur6-163149864554) (C: \ users \ 21349 \ desktop \ QRcode \ 2021-09-09_151458. PNG)]

[the external link image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-SuSm8n3e-1631498645356)(C:\Users349\Desktop\QrCode21-09-09_151834.png)]

5.4 summary of zipper table

6, DIM layer building table loading data

First, determine which are fact tables and which are dimension tables?

  • The green one is the fact table

  • Gray is the dimension table

    How to handle dimension tables, daily snapshots and zipper tables?

  • Daily snapshot of small table: product classification table, merchant store table, merchant regional organization table and payment method table

  • Zipper table for large table: product information table

6.1 commodity classification table

The data in the database is standardized (meeting the three paradigms), but the standardized data brings inconvenience to the query.

Note: the dimension table of commodity classification is de normalized here, and irrelevant information is omitted to make it Wide Watch

DROP TABLE IF EXISTS dim.dim_trade_product_cat;
create table if not exists dim.dim_trade_product_cat(
firstId int, -- Primary commodity classification id
firstName string, -- Class I commodity classification name
secondId int, -- Secondary commodity classification Id
secondName string, -- Secondary commodity classification name
thirdId int, -- Class III commodity classification id
thirdName string -- Class III commodity classification name
)
partitioned by (dt string)
STORED AS PARQUET;

realization

select T1.catid, T1.catname, T2.catid, T2.catname, T3.catid,
T3.catname
from (select catid, catname, parentid
from ods.ods_trade_product_category
where level=3 and dt='2020-07-01') T3
left join
(select catid, catname, parentid
from ods.ods_trade_product_category
where level=2 and dt='2020-07-01') T2
on T3.parentid=T2.catid
left join
(select catid, catname, parentid
from ods.ods_trade_product_category
where level=1 and dt='2020-07-01') T1
on T2.parentid=T1.catid;

Data loading:

​ /root/data/lagoudw/script/trade/dim_load_product_cat.sh

#!/bin/bash
source /etc/profile
if [ -n "$1" ]
then
do_date=$1
else
do_date=`date -d "-1 day" +%F`
fi
sql="
insert overwrite table dim.dim_trade_product_cat
partition(dt='$do_date')
select
t1.catid, -- Primary classification id
t1.catname, -- Primary classification name
t2.catid, -- Secondary classification id
t2.catname, -- Secondary classification name
t3.catid, -- Three level classification id
t3.catname -- Three level classification name
from
-- Commodity tertiary classification data
(select catid, catname, parentid
from ods.ods_trade_product_category
where level=3 and dt='$do_date') t3
left join
-- Commodity secondary classification data
(select catid, catname, parentid
from ods.ods_trade_product_category
where level=2 and dt='$do_date') t2
on t3.parentid = t2.catid
left join
-- Commodity primary classification data
(select catid, catname, parentid
from ods.ods_trade_product_category
where level=1 and dt='$do_date') t1
on t2.parentid = t1.catid;
"
hive -e "$sql"
6.2 commodity regional organization table

Merchant store table, merchant regional organization table = > one dimension table

This is also an inverse standardized design, which organizes the merchant store table and merchant regional organization table into a table and widens it. It is reflected in one line of data: business information, city information, region information, and id and name are included in the information.

drop table if exists dim.dim_trade_shops_org;
create table dim.dim_trade_shops_org(
shopid int,
shopName string,
cityId int,
cityName string ,
regionId int ,
regionName string
)
partitioned by (dt string)
STORED AS PARQUET;

realization

select T1.shopid, T1.shopname, T2.id cityid, T2.orgname
cityname, T3.id regionid, T3.orgname regionname
from
(select shopid, shopname, areaid
from ods.ods_trade_shops
where dt='2020-07-01') T1
left join
(select id, parentid, orgname, orglevel
from ods.ods_trade_shop_admin_org
where orglevel=2 and dt='2020-07-01') T2
on T1.areaid=T2.id
left join
(select id, orgname, orglevel
from ods.ods_trade_shop_admin_org
where orglevel=1 and dt='2020-07-01') T3
on T2.parentid=T3.id
limit 10;

/root/data/lagoudw/script/trade/dim_load_shop_org.sh

#!/bin/bash
source /etc/profile
if [ -n "$1" ]
then
do_date=$1
else
do_date=`date -d "-1 day" +%F`
fi
sql="
insert overwrite table dim.dim_trade_shops_org
partition(dt='$do_date')
select t1.shopid,
t1.shopname,
t2.id as cityid,
t2.orgname as cityName,
t3.id as region_id,
t3.orgname as region_name
from (select shopId, shopName, areaId
from ods.ods_trade_shops
where dt='$do_date') t1
left join
(select id, parentId, orgname, orglevel
from ods.ods_trade_shop_admin_org
where orglevel=2 and dt='$do_date') t2
on t1.areaid = t2.id
left join
(select id, parentId, orgname, orglevel
from ods.ods_trade_shop_admin_org
where orglevel=1 and dt='$do_date') t3
on t2.parentid = t3.id;
"
hive -e "$sql"
6.3 payment method table

The table information in ODS is cropped and only the necessary information is retained

drop table if exists dim.dim_trade_payment;
create table if not exists dim.dim_trade_payment(
paymentId string, -- Payment method id
paymentName string -- Name of payment method
)
partitioned by (dt string)
STORED AS PARQUET;

/root/data/lagoudw/script/trade/dim_load_payment.sh

#!/bin/bash
source /etc/profile
if [ -n "$1" ]
then
do_date=$1
else
do_date=`date -d "-1 day" +%F`
fi
sql="
insert overwrite table dim.dim_trade_payment
partition(dt='$do_date')
select id, payName
from ods.ods_trade_payments
where dt='$do_date';
"
hive -e "$sql"
6.4 commodity information table

Use zipper table to process product information.

  • Historical data = > initialize zipper table (start date: current day; end date: 9999-12-31) [execute only once]
  • Daily processing of zipper table [processing each time data is loaded]
    • New data: daily new data (ODS) = > start date; same day; End date: 9999-12-31
    • Historical data: the zipper meter (DIM) is left connected with the daily new data (ODS)
      • Connect data, data changes, end date: current day
      • The data is not connected, the data does not change, and the end date remains unchanged
6.4.1 create dimension table

Two columns shall be added to the zipper table to record the effective date and expiration date respectively

drop table if exists dim.dim_trade_product_info;
create table dim.dim_trade_product_info(
`productId` bigint,
`productName` string,
`shopId` string,
`price` decimal,
`isSale` tinyint,
`status` tinyint,
`categoryId` string,
`createTime` string,
`modifyTime` string,
`start_dt` string,
`end_dt` string
) COMMENT 'Product list'
STORED AS PARQUET;
6.4.2 initial data loading (historical data loading, only once)
insert overwrite table dim.dim_trade_product_info
select productId,
productName,
shopId,
price,
isSale,
status,
categoryId,
createTime,
modifyTime,
-- modifyTime Non empty fetching modifyTime,Otherwise take createTime;substr Take date
case when modifyTime is not null
then substr(modifyTime, 0, 10)
else substr(createTime, 0, 10)
end as start_dt,
'9999-12-31' as end_dt
from ods.ods_trade_product_info
where dt = '2020-07-12';
6.4.2 incremental data import (execute repeatedly, and execute each time the data is loaded)

​ /root/data/lagoudw/script/trade/dim_load_product_info.sh

#!/bin/bash
source /etc/profile
if [ -n "$1" ]
then
do_date=$1
else
do_date=`date -d "-1 day" +%F`
fi
sql="
insert overwrite table dim.dim_trade_product_info
select productId,
productName,
shopId,
price,
isSale,
status,
categoryId,
createTime,
modifyTime,
case when modifyTime is not null
then substr(modifyTime,0,10)
else substr(createTime,0,10)
end as start_dt,
'9999-12-31' as end_dt
from ods.ods_trade_product_info
where dt='$do_date'
union all
select dim.productId,
dim.productName,
dim.shopId,
dim.price,
dim.isSale,
dim.status,
dim.categoryId,
dim.createTime,
dim.modifyTime,
dim.start_dt,
case when dim.end_dt >= '9999-12-31' and ods.productId
is not null
then '$do_date'
else dim.end_dt
end as end_dt
from dim.dim_trade_product_info dim left join
(select *
from ods.ods_trade_product_info
where dt='$do_date' ) ods
on dim.productId = ods.productId
"
hive -e "$sql"

7, DWD layering table loading data

There are two tables to process: order table and order product table. Of which:

  • The order table is a periodic fact table; In order to keep the order status, you can use the zipper table for processing;

  • The order product table is a common fact table, which is processed in a conventional way;

    • If there are business requirements for data cleaning and data conversion, ODS = > DWD
    • If there are no business requirements for data cleaning and data conversion, it will remain in ODS without any change. This is the processing method of this project.

    Order status

    • -3: User rejection
    • -2: Unpaid orders
    • -1: User cancel
    • 0: to be shipped
    • 1: In distribution
    • 2: User confirms receipt

    There is a time limit from order creation to final completion; In business, it is not allowed that the status of an order is still changing after one month;

7.1 DWD floor construction table

Remarks:

  • Unlike the dimension table, the order fact table has a large number of records
  • The order has a life cycle; The order status cannot be changed forever (the life cycle of the order is generally about 15 days)
  • The order is a zipper table and a partition table
  • Partition purpose: once the order is terminated, it will not be calculated repeatedly
  • Partition conditions: creation date of the order; Ensure that the same order is in the same partition
-- Order fact table(Zipper Watch)
DROP TABLE IF EXISTS dwd.dwd_trade_orders;
create table dwd.dwd_trade_orders(
`orderId` int,
`orderNo` string,
`userId` bigint,
`status` tinyint,
`productMoney` decimal,
`totalMoney` decimal,
`payMethod` tinyint,
`isPay` tinyint,
`areaId` int,
`tradeSrc` tinyint,
`tradeType` int,
`isRefund` tinyint,
`dataFlag` tinyint,
`createTime` string,
`payTime` string,
`modifiedTime` string,
`start_date` string,
`end_date` string
) COMMENT 'Order fact sheet'
partitioned by (dt string)
STORED AS PARQUET;
7.2 DWD layer data loading
-- Note: time date format conversion
-- 'yyyy-MM-dd HH:mm:ss' => timestamp => 'yyyy-MM-dd'
select unix_timestamp(modifiedtime, 'yyyy-MM-dd HH:mm:ss')
from ods.ods_trade_orders limit 10;
select from_unixtime(unix_timestamp(modifiedtime, 'yyyy-MM-ddHH:mm:ss'), 'yyyy-MM-dd')
from ods.ods_trade_orders limit 10;

​ /root/data/lagoudw/script/trade/dwd_load_trade_orders.sh

#!/bin/bash
source /etc/profile
if [ -n "$1" ]
then
do_date=$1
else
do_date=`date -d "-1 day" +%F`
fi
sql="
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.dynamic.partition=true;
INSERT OVERWRITE TABLE dwd.dwd_trade_orders
partition(dt)
SELECT orderId,orderNo,userId,status,productMoney,totalMoney,payMethod,isPay,areaId,tradeSrc,tradeType,isRefund,
dataFlag,createTime,payTime,modifiedTime, case when modifiedTime is not null
then from_unixtime(unix_timestamp(modifiedTime,'yyyy-MM-dd HH:mm:ss'),'yyyy-MM-dd')
else from_unixtime(unix_timestamp(createTime,'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd')
end as start_date,'9999-12-31' as end_date,
from_unixtime(unix_timestamp(createTime, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd') as dt
FROM ods.ods_trade_orders WHERE dt='$do_date' union all
SELECT A.orderId,A.orderNo,A.userId,A.status,A.productMoney,A.totalMoney,A.payMethod,A.isPay,A.areaId,A.tradeSrc,
A.tradeType,A.isRefund,A.dataFlag,A.createTime,A.payTime,A.modifiedTime,A.start_date,
CASE WHEN B.orderid IS NOT NULL AND A.end_date >'$do_date' THEN date_add('$do_date', -1)
ELSE A.end_date END AS end_date,
from_unixtime(unix_timestamp(A.createTime, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd') as dt
FROM (SELECT * FROM dwd.dwd_trade_orders WHERE dt>date_add('$do_date', -15)) A
left outer join (SELECT * FROM ods.ods_trade_orders WHERE dt='$do_date') B ON A.orderId = B.orderId;
"
hive -e "$sql"

8, DWS layering table loading data

DIM, DWD = > Data Warehouse layering, data warehouse theory

Demand: calculation day

  • All order information nationwide

  • Order information of national and first-class commodity classification

  • Order information of national and secondary commodity classification

  • All order information in the region

  • Regional and primary commodity classification order information

  • Regional and secondary commodity classification order information

  • City all order information

  • Order information of city and primary commodity classification

  • City and secondary commodity classification order information

    Required information: order table, order commodity table, commodity information table dimension table, commodity classification dimension table, merchant region dimension table

    Order table = > order id, order status

    Order commodity table = > order id, commodity id, merchant id, unit price, quantity

    Commodity information dimension table = > commodity id, three-level classification id

    Commodity classification dimension table = > Level 1 name, level 1 Classification id, level 2 name, level 2 classification id, level 3 name, level 3 Classification id
    Merchant region dimension table = > merchant id, region name, region id, city name, City id

    Order table, order commodity table, commodity information dimension table = > order id, commodity id, merchant id, three-level classification id, unit price, quantity (order details)

    Order details, commodity classification dimension table, merchant region dimension table = > order id, commodity id, merchant id, three-level classification name, unit price, quantity, region, city = > order details

8.1 DWS floor construction table
  • dws_trade_orders is slightly aggregated from the following table:
    • dwd.dwd_trade_orders (zipper table, partition table)
    • ods.ods_trade_order_product (partition table)
    • dim.dim_trade_product_info (dimension table, zipper table)
  • dws_trade_orders_w (order details) consists of the following tables:
    • ads.dws_trade_orders (partition table)
    • dim.dim_trade_product_cat (partition table)
    • dim.dim_trade_shops_org (partition table)
-- Order Details (Light summary fact sheet). Details of each order
DROP TABLE IF EXISTS dws.dws_trade_orders;
create table if not exists dws.dws_trade_orders(
orderid string, -- order id
cat_3rd_id string, -- Three level classification of commodities id
shopid string, -- shop id
paymethod tinyint, -- Payment method
productsnum bigint, -- Quantity of goods
paymoney double, -- Order item detail amount
paytime string -- Order time
)
partitioned by (dt string)
STORED AS PARQUET;

-- Order detail table
DROP TABLE IF EXISTS dws.dws_trade_orders_w;
create table if not exists dws.dws_trade_orders_w(
orderid string, -- order id
cat_3rd_id string, -- Three level classification of commodities id
thirdname string, -- Commodity class III classification name
secondname string, -- Commodity secondary classification name
firstname string, -- Commodity primary classification name
shopid string, -- shop id
shopname string, -- Shop name
regionname string, -- Store area
cityname string, -- Store City
paymethod tinyint, -- Payment method
productsnum bigint, -- Quantity of goods
paymoney double, -- Order detail amount
paytime string -- Order time
)
partitioned by (dt string)
STORED AS PARQUET;
8.2 DWS layer loading data

​ /root/data/lagoudw/script/trade/dws_load_trade_orders.sh

Remarks: DWS_ trade_ orders/dws_ trade_ orders_ Multiple records may appear in an order in W

#!/bin/bash
source /etc/profile
if [ -n "$1" ]
then
do_date=$1
else
do_date=`date -d "-1 day" +%F`
fi
sql="
insert overwrite table dws.dws_trade_orders
partition(dt='$do_date')
select t1.orderid as orderid,t3.categoryid as cat_3rd_id,t3.shopid as shopid,t1.paymethod as paymethod,
t2.productnum as productsnum,t2.productnum*t2.productprice as pay_money,t1.paytime as paytime
from (select orderid, paymethod, paytime from dwd.dwd_trade_orders where dt='$do_date') T1
left join
(select orderid, productid, productnum, productprice from ods.ods_trade_order_product
where dt='$do_date') T2 on t1.orderid = t2.orderid left join
(select productid, shopid, categoryid from dim.dim_trade_product_info where start_dt <= '$do_date'
and end_dt >= '$do_date' ) T3 on t2.productid=t3.productid;

insert overwrite table dws.dws_trade_orders_w
partition(dt='$do_date')
select t1.orderid,t1.cat_3rd_id,t2.thirdname,t2.secondname,t2.firstname,t1.shopid,t3.shopname,t3.regionname,t3.cityname,
t1.paymethod,t1.productsnum,t1.paymoney,t1.paytime from (select orderid,cat_3rd_id,shopid,paymethod,productsnum,
paymoney,paytime from dws.dws_trade_orders where dt='$do_date') T1
join (select thirdid, thirdname, secondid, secondname,firstid, firstname from dim.dim_trade_product_cat
where dt='$do_date') T2 on T1.cat_3rd_id = T2.thirdid join
(select shopid, shopname, regionname, cityname from dim.dim_trade_shops_org where dt='$do_date') T3
on T1.shopid = T3.shopid
"
hive -e "$sql"

Note: prepare the test data by yourself and ensure that the test date has data

  • dwd.dwd_trade_orders (zipper table, partition table)

  • ods.ods_trade_order_product (partition table)

  • dim.dim_trade_product_info (dimension table, zipper table)

  • dim.dim_trade_product_cat (partition table)

  • dim.dim_trade_shops_org (partition table)

    Construction test data (zipper partition table)

insert overwrite table dwd.dwd_trade_orders
-- The date can be set at will
partition(dt='2020-07-12')
select orderid,orderno,userid,status,productmoney,totalmoney,paymethod,ispay,areaid,tradesrc,tradetype,isrefund,dataflag,
'2020-07-12',paytime,modifiedtime,start_date,end_date from dwd.dwd_trade_orders where end_date='9999-12-31';

9, ADS layer development

Demand: calculation day

  • All orders nationwide

  • National and first-class commodity classification order message

  • National and secondary commodity classification order message

  • Region all order messages

  • Regional and primary commodity classification order messages

  • Regional and secondary commodity classification order messages

  • City all order messages

  • City, first level commodity classification order message

  • City, secondary commodity classification order message

    Table used: dws.dws_trade_orders_w

9.1 ADS floor construction table
-- ADS Layer order analysis table
DROP TABLE IF EXISTS ads.ads_trade_order_analysis;
create table if not exists ads.ads_trade_order_analysis(
areatype string, -- Regional scope: regional type (national, regional and urban)
regionname string, -- Area name
cityname string, -- City name
categorytype string, -- Commodity classification type (primary and secondary)
category1 string, -- Commodity primary classification name
category2 string, -- Commodity secondary classification name
totalcount bigint, -- Order quantity
total_productnum bigint, -- Quantity of goods
totalmoney double -- Payment amount
)
partitioned by (dt string)
row format delimited fields terminated by ',';
9.2 ADS layer loading data

​ /root/data/lagoudw/script/trade/ads_load_trade_order_analysis.sh

Note: there are multiple goods in an order; Multiple commodities have different classifications; This will lead to multiple classifications of an order, which are counted separately;

#!/bin/bash
source /etc/profile
if [ -n "$1" ]
then
do_date=$1
else
do_date=`date -d "-1 day" +%F`
fi
sql="
with mid_orders as (
select regionname,
cityname,
firstname category1,
secondname category2,
count(distinct orderid) as totalcount,
sum(productsnum) as total_productnum,
sum(paymoney) as totalmoney
from dws.dws_trade_orders_w
where dt='$do_date'
group by regionname, cityname, firstname, secondname
)
insert overwrite table ads.ads_trade_order_analysis
partition(dt='$do_date')
select 'whole country' as areatype,
'' as regionname,
'' as cityname,
'' as categorytype,
'' as category1,
'' as category2,
sum(totalcount),
sum(total_productnum),
sum(totalmoney)
from mid_orders
union all
select 'whole country' as areatype,
'' as regionname,
'' as cityname,
'class a' as categorytype,
category1,
'' as category2,
sum(totalcount),
sum(total_productnum),
sum(totalmoney)
from mid_orders
group by category1
union all
select 'whole country' as areatype,
'' as regionname,
'' as cityname,
'second level' as categorytype,
'' as category1,
category2,
sum(totalcount),
sum(total_productnum),
sum(totalmoney)
from mid_orders
group by category2
union all
select 'Large area' as areatype,
regionname,
'' as cityname,
'' as categorytype,
'' as category1,
'' as category2,
sum(totalcount),
sum(total_productnum),
sum(totalmoney)
from mid_orders
group by regionname
union all
select 'Large area' as areatype,
regionname,
'' as cityname,
'class a' as categorytype,
category1,
'' as category2,
sum(totalcount),
sum(total_productnum),
sum(totalmoney)
from mid_orders
group by regionname, category1
union all
select 'Large area' as areatype,
regionname,
'' as cityname,
'second level' as categorytype,
'' as category1,
category2,
sum(totalcount),
sum(total_productnum),
sum(totalmoney)
from mid_orders
group by regionname, category2
union all
select 'city' as areatype,
'' as regionname,
cityname,
'' as categorytype,
'' as category1,
'' as category2,
sum(totalcount),
sum(total_productnum),
sum(totalmoney)
from mid_orders
group by cityname
union all
select 'city' as areatype,
'' as regionname,
cityname,
'class a' as categorytype,
category1,
'' as category2,
sum(totalcount),
sum(total_productnum),
sum(totalmoney)
from mid_orders
group by cityname, category1
union all
select 'city' as areatype,
'' as regionname,
cityname,
'second level' as categorytype,
'' as category1,
category2,
sum(totalcount),
sum(total_productnum),
sum(totalmoney)
from mid_orders
group by cityname, category2;
"
hive -e "$sql"

Note: because in dws.dws_trade_orders_w, an order may have multiple records, so count (distinct order ID) should be used when counting the order quantity

10, Data export

​ ads.ads_trade_order_analysis partition table, export to MySQL using DataX.

11, Summary

Script call order:

# Loading ODS data (including DataX migration data)
/data/lagoudw/script/trade/ods_load_trade.sh
# Loading DIM layer data
/data/lagoudw/script/trade/dim_load_product_cat.sh
/data/lagoudw/script/trade/dim_load_shop_org.sh
/data/lagoudw/script/trade/dim_load_payment.sh
/data/lagoudw/script/trade/dim_load_product_info.sh
# Load DWD layer data
/data/lagoudw/script/trade/dwd_load_trade_orders.sh
# Load DWS layer data
/data/lagoudw/script/trade/dws_load_trade_orders.sh
# Load ADS layer data
/data/lagoudw/script/trade/ads_load_trade_order_analysis.sh

Main technical points:

  • Zipper Watch ]: create, use and rollback; Commodity information table and order table (periodic fact table; partition table + zipper table)
  • Wide table (inverse normalization): commodity classification table, commodity regional organization table, order details and order details (light summary fact table)

um(totalmoney)
from mid_orders
group by category2
union all
select 'region' as areatype,
regionname,
'' as cityname,
'' as categorytype,
'' as category1,
'' as category2,
sum(totalcount),
sum(total_productnum),
sum(totalmoney)
from mid_orders
group by regionname
union all
select 'region' as areatype,
regionname,
'' as cityname,
As category type,
category1,
'' as category2,
sum(totalcount),
sum(total_productnum),
sum(totalmoney)
from mid_orders
group by regionname, category1
union all
select 'region' as areatype,
regionname,
'' as cityname,
As category type,
'' as category1,
category2,
sum(totalcount),
sum(total_productnum),
sum(totalmoney)
from mid_orders
group by regionname, category2
union all
select 'city' as areatype,
'' as regionname,
cityname,
'' as categorytype,
'' as category1,
'' as category2,
sum(totalcount),
sum(total_productnum),
sum(totalmoney)
from mid_orders
group by cityname
union all
select 'city' as areatype,
'' as regionname,
cityname,
As category type,
category1,
'' as category2,
sum(totalcount),
sum(total_productnum),
sum(totalmoney)
from mid_orders
group by cityname, category1
union all
select 'city' as areatype,
'' as regionname,
cityname,
As category type,
'' as category1,
category2,
sum(totalcount),
sum(total_productnum),
sum(totalmoney)
from mid_orders
group by cityname, category2;
"
hive -e "$sql"

​	Note: due to `dws.dws_trade_orders_w` In, an order may have multiple records, so it should be used when counting the order quantity `count(distinct orderid)`

### 10, Data export

​	`ads.ads_trade_order_analysis` Partition tables, using DataX Export to MySQL. 

### 11, Summary

![Insert picture description here](https://img-blog.csdnimg.cn/20210407231902660.png?x-oss-process=image/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3dlaXhpbl80NzEzNDExOQ==,size_16,color_FFFFFF,t_70)

​	Script call order:

```sh
# Loading ODS data (including DataX migration data)
/data/lagoudw/script/trade/ods_load_trade.sh
# Loading DIM layer data
/data/lagoudw/script/trade/dim_load_product_cat.sh
/data/lagoudw/script/trade/dim_load_shop_org.sh
/data/lagoudw/script/trade/dim_load_payment.sh
/data/lagoudw/script/trade/dim_load_product_info.sh
# Load DWD layer data
/data/lagoudw/script/trade/dwd_load_trade_orders.sh
# Load DWS layer data
/data/lagoudw/script/trade/dws_load_trade_orders.sh
# Load ADS layer data
/data/lagoudw/script/trade/ads_load_trade_order_analysis.sh

Main technical points:

  • Zipper Watch ]: create, use and rollback; Commodity information table and order table (periodic fact table; partition table + zipper table)
  • Wide table (inverse normalization): commodity classification table, commodity regional organization table, order details and order details (light summary fact table)

Posted by pengu on Sat, 20 Nov 2021 03:23:31 -0800