1. Establishment of cumulative measurement fact sheet
Execute the following script to create the month_end_balance_fact fact table to store the monthly cumulative value of the sales order amount and quantity.set search_path=tds; create table month_end_balance_fact ( year_month int, product_sk int, month_end_amount_balance numeric(10,2), month_end_quantity_balance int ); comment on table month_end_balance_fact is 'Cumulative Measure Fact Table'; comment on column month_end_balance_fact.year_month is 'Years'; comment on column month_end_balance_fact.product_sk is 'Product Agent Key'; comment on column month_end_balance_fact.month_end_amount_balance is 'Cumulative amount'; comment on column month_end_balance_fact.month_end_quantity_balance is 'Cumulative Quantity';
II. Initial Loading
Now load the data in the month_end_sales_order_fact table into the month_end_balance_facttable, and the script that initially loads the month_end_balance_facttable is shown below. This script loads cumulative monthly sales order summary data, which accumulates from January to that month every year, and does not span years.Subqueries retrieve the data of the month_end_sales_order_fact table and its year-month and maximum-month proxy keys. The outer query aggregates the cumulative sales data from January to that month, and the a. year_month <= B. max_year_month condition is used to limit the statistics to the maximum month in existence.insert into month_end_balance_fact select a.year_month, b.product_sk, sum(b.month_order_amount) month_order_amount, sum(b.month_order_quantity) month_order_quantity from (select distinct year_month, year_month/100 year1, year_month - year_month/100*100 month1 from month_end_sales_order_fact) a, (select *, year_month/100 year1, year_month - year_month/100*100 month1, max(year_month) over () max_year_month from month_end_sales_order_fact) b where a.year_month <= b.max_year_month and a.year1 = b.year1 and b.month1 <= a.month1 group by a.year_month, b.product_sk;
To confirm that the initial loading is correct, after executing the initial loading script, query the month_end_sales_order_fact s and month_end_balance_fact s tables, respectively.
Query cycle snapshots:
The query structure is shown in Figure 1.select year_month, product_sk psk, month_order_amount amt, month_order_quantity qty from month_end_sales_order_fact order by year_month, psk;
Figure 1
Query cumulative measures:
The query structure is shown in Figure 2.select year_month, product_sk psk, month_end_amount_balance amt, month_end_quantity_balance qty from month_end_balance_fact order by year_month, psk;
Figure 2
As can be seen, the sales amount of goods in March 2016 was accumulated to April 2016, and the sales amount of goods in March and April 2016 was accumulated to May 2016, etc. This kind of cumulative measure is sometimes called sliding order problem.
III. Periodic Loading
The month_balance_sum.sql script shown below is used to periodically load sales order cumulative metrics, which are executed once a month to load last month's data. The script can be executed after the monthly cycle snapshot table is loaded regularly. Note that the fact that you are late has a significant impact on cumulative measures. For example, the January 2016 data did not enter the data warehouse until January 2017, so the cumulative measures for each month after February 2016 would change.Subqueries combine cumulative metrics and monthly cycle snapshot tables to increase the cumulative data of last month. The outermost query executes grouping aggregation of sales data by month and product. The innermost case statement is used to revert to zero and accumulate every January. v_year_month is the year-month parameter.insert into month_end_balance_fact select year_month, product_sk, sum(month_order_amount), sum(month_order_quantity) from (select * from month_end_sales_order_fact where year_month = :v_year_month union all select :v_year_month, product_sk product_sk, month_end_amount_balance month_order_amount, month_end_quantity_balance month_order_quantity from month_end_balance_fact where year_month in (select max(case when :v_year_month - :v_year_month/100*100 = 1 then 0 else year_month end) from month_end_balance_fact)) t group by year_month, product_sk;
IV. Testing
Implement the monthly cycle snapshot function to load June 2017 data.Execute cumulative metrics to load scripts periodically, and pass the output of shell command `date +%Y%m'into the month_balance_sum.sql file as the year and month parameters.select fn_month_sum(201706);
Execute the same query as the previous initial load, and the query results of the periodic snapshot table and the cumulative metric table are shown in Figures 3 and 4, respectively.su - gpadmin -c 'export PGPASSWORD=123456;psql -U dwtest -d dw -h hdp3 -v v_year_month=''`date +%Y%m`'' -f ~/month_balance_sum.sql'
Figure 3
Figure 4
As you can see, the amount and quantity of goods sold in May 2017 were accumulated to June 2017. Product 1, 2 and 5 accumulated sales data for 5 and 6 months. Product 3 and 4 did not sell in June, so sales data for May was extended to June.
V. Query
Digital measures in fact tables can be divided into three categories: additive, semi-additive and non-additive. Additivity metrics can be aggregated according to any dimension associated with the fact table, that is to say, the sum of the metrics aggregated according to any dimension is the same, and most of the metrics in the fact table belong to this category. Semi-additive measures can aggregate some dimensions, but not all. Balance is a common semi-additive measure, which can be added across all dimensions except the time dimension. There are other measures that are totally unacceptable, such as proportions. For non-additive measures, it is better to store as many additive components as possible, such as molecule and denominator of proportion, and aggregate these components into the final result set. The calculation of non-additive measures usually takes place in BI or OLAP layers.Cumulative measures must be used carefully because they are "semi-additive". A semi-additive measure is not additive in some dimensions (usually time dimensions). For example, the accumulative sales amount at the end of the month can be accumulatively accumulated by the product.
The cumulative end-of-month amount by month:dw=> select year_month, sum(month_end_amount_balance) s dw-> from month_end_balance_fact dw-> group by year_month dw-> order by year_month; year_month | s ------------+----------- 201603 | 191158.00 201604 | 345600.00 201605 | 455772.00 201606 | 572190.00 201705 | 253400.00 201706 | 294876.00 (6 rows)
The above query results are incorrect. The correct result should be the same as the following query on the month_end_sales_order_fact table.dw=> select product_name, sum(month_end_amount_balance) s dw-> from month_end_balance_fact a, dw-> product_dim b dw-> where a.product_sk = b.product_sk dw-> group by product_name dw-> order by product_name; product_name | s -----------------+----------- flat panel | 99332.00 floppy drive | 940933.00 hard disk drive | 941337.00 keyboard | 125220.00 lcd panel | 6174.00 (5 rows)
dw=> select product_name, sum(month_order_amount) s dw-> from month_end_sales_order_fact a, dw-> product_dim b dw-> where a.product_sk = b.product_sk dw-> group by product_name dw-> order by product_name; product_name | s -----------------+----------- flat panel | 49666.00 floppy drive | 362393.00 hard disk drive | 384533.00 keyboard | 67387.00 lcd panel | 3087.00 (5 rows)