Net Programmers Learn Oracle Series (21): Grouped Query (GROUP BY)

Keywords: Oracle SQL

1. GROUP BY Standard Grouping

For the sake of narration, I will call the grouping of simple GROUP BY clause as standard grouping, and the column (or expression) appearing in GROUP BY clause as grouping column.

1.1. Overview of GROUP BY

In grouping queries, the GROUP BY clause is used to group selected row sets according to the value of a specified column or expression, and return a row of data collected from the group for each group. Basic grammar:

SELECT expression1, expression2, ... expression_n, 
       aggregate_function (aggregate_expression)
FROM tables
[WHERE conditions]
GROUP BY expression1, expression2, ... expression_n
[HAVING having_condition];

Simple examples:

SELECT t.dept_code,
  MAX(t.post_salary) max_salary,  -- Department's highest post salary
  MIN(t.post_salary) min_salary,  -- Minimum Wage for Departments
  AVG(t.post_salary) avg_salary,  -- Average post salary of Department
  SUM(t.post_salary) sum_salary,  -- The sum of department post wages
  COUNT(t.post_salary) cnt_salary -- Departmental Wages
FROM demo.t_staff t
GROUP BY t.dept_code
HAVING AVG(t.post_salary)>3500
ORDER BY t.dept_code;

Matters needing attention:

  • 1. Grouping columns or aggregation functions or constants can only appear in SELECT clauses.
  • 2. In the HAVING clause, only grouping columns or aggregation functions or constants can appear.
  • 3. In GROUP BY clause, only columns or constants of scalar primitive types (such as VARCHAR2, NUMBER, DATE, etc.) can appear, but columns of BLOB, CLOB and other types cannot appear.
  • 4. In fact, grouped columns may not appear in SELECT clauses, but this may make the meaning of query results unclear.
  • 5. In SELECT, HAVING, GROUP BY clauses, there are also some special functions, such as SYSDATE (feeling meaningless).

1.2. The difference between WHERE and HAVING?

Let me start by saying, "What's the difference between WHERE and HAVING?" This is definitely a problem! Because there is no comparability between the two, in fact, people who ask such questions often have a weak SQL foundation.

In the query sentence containing GROUP BY clause, the function of WHERE clause is to filter the row data before grouping the query results, and remove the unqualified rows; while the function of HAVING clause is to filter the group data after grouping the query results, and remove the unqualified groups. In other words, because aggregation functions provide information about groups, it is impossible to provide information about groups before grouping. That is to say, aggregation functions cannot be used in WHERE clauses, which is why HAVING clauses appear. With the HAVING clause, we can easily filter the group data after grouping. Generally speaking, what can be filtered by WHERE should not be filtered by HAVING!

2. GROUP BY Extended Grouping

In the actual business development, only standard grouping may not be enough, often need more dimensions of subtotal, total. For this kind of requirement, Oracle provides abundant extended grouping functions; although UNION ALL can generally achieve similar results, it is not flexible enough and has low performance.

2.1. ROLLUP grouping

ROLLUP is an extension of the GROUP BY clause, which allows the calculation of subtotals and totals of standard groupings and some dimensions. Syntax:

GROUP BY ROLLUP(grouping_column_reference_list)

The result of ROLLUP calculation is related to the order of specified columns in parentheses, because the grouping process of ROLLUP is directional. First the standard grouping is calculated, then the higher subtotal is calculated progressively from right to left until all columns are calculated, and finally the total is calculated. When n columns are specified in ROLLUP, there are n+1 grouping modes in the whole calculation process. For example, the grouping process of GROUP BY ROLLUP(A,B) is equivalent to that of GROUP BY(A,B) grouping in the first step, GROUP BY(A,NULL) grouping in the second step and GROUP BY (NULL) grouping in the third step.

Example 1:

SELECT t.dept_code,SUM(t.post_salary) sum_salary,MAX(t.post_salary) max_salary
FROM demo.t_staff t
WHERE t.dept_code IN('010102','010103')
GROUP BY ROLLUP(t.dept_code);

Result:

DEPT_CODE                                          SUM_SALARY MAX_SALARY
-------------------------------------------------- ---------- ----------
010102                                                  13500       7500
010103                                                   7850       5050
                                                        21350       7500

Example 2:

SELECT t.dept_code,t.post_code,SUM(t.post_salary) sum_salary,MAX(t.post_salary) max_salary
FROM demo.t_staff t
WHERE t.dept_code IN('010102','010103')
GROUP BY ROLLUP(t.dept_code,t.post_code);

Result:

DEPT_CODE                                      POST_CODE                                    SUM_SALARY MAX_SALARY
---------------------------------------------- -------------------------------------------- ---------- ----------
010102                                         P40                                                7500       7500
010102                                         P50                                                6000       6000
010102                                                                                           13500       7500
010103                                         P40                                                5050       5050
010103                                         P50                                                2800       2800
010103                                                                                            7850       5050
                                                                                                 21350       7500

Example 3, Partial ROLLUP grouping (without some subtotals, which can be used for total timing):

SELECT t.dept_code,t.post_code,SUM(t.post_salary) sum_salary,MAX(t.post_salary) max_salary
FROM demo.t_staff t
WHERE t.dept_code IN('010102','010103')
GROUP BY t.dept_code,ROLLUP(t.post_code);

Result:

DEPT_CODE                                     POST_CODE                                    SUM_SALARY MAX_SALARY
--------------------------------------------- -------------------------------------------- ---------- ----------
010102                                        P40                                                7500       7500
010102                                        P50                                                6000       6000
010102                                                                                          13500       7500
010103                                        P40                                                5050       5050
010103                                        P50                                                2800       2800
010103                                                                                           7850       5050

2.2. CUBE grouping

CUBE is an extension of GROUP BY clause, which allows calculation of standard grouping and subtotals and totals of all dimensions. Syntax:

GROUP BY CUBE(grouping_column_reference_list)

CUBE counts all possible groupings and generates cross reports. CUBE has more groupings than ROLLUP, and contains ROLLUP statistical results, and the results are independent of the order of grouping columns, but if the order of columns is different, the default ranking of results will be different. When n columns are specified in CUBE, there are two n-th power grouping modes in the whole calculation process. For example, GROUP BY CUBE(A,B) is equivalent to grouping by GROUP BY(A,B), grouping by GROUP BY(A,NULL), grouping by GROUP BY(NULL,B) and grouping by GROUP BY(NULL,NULL).

Example 1:

SELECT t.dept_code,SUM(t.post_salary) sum_salary,MAX(t.post_salary) max_salary
FROM demo.t_staff t
WHERE t.dept_code IN('010102','010103')
GROUP BY CUBE(t.dept_code);

Result:

DEPT_CODE                                          SUM_SALARY MAX_SALARY
-------------------------------------------------- ---------- ----------
                                                        21350       7500
010102                                                  13500       7500
010103                                                   7850       5050

Example 2:

SELECT t.dept_code,t.post_code,SUM(t.post_salary) sum_salary,MAX(t.post_salary) max_salary
FROM demo.t_staff t
WHERE t.dept_code IN('010102','010103')
GROUP BY CUBE(t.dept_code,t.post_code);

Result:

DEPT_CODE                                     POST_CODE                                  SUM_SALARY MAX_SALARY
--------------------------------------------- ------------------------------------------ ---------- ----------
                                                                                              21350       7500
                                              P40                                             12550       7500
                                              P50                                              8800       6000
010102                                                                                        13500       7500
010102                                        P40                                              7500       7500
010102                                        P50                                              6000       6000
010103                                                                                         7850       5050
010103                                        P40                                              5050       5050
010103                                        P50                                              2800       2800

Example 3, Partial CUBE grouping (without some subtotals, which can be used for total timing):

SELECT t.dept_code,t.post_code,SUM(t.post_salary) sum_salary,MAX(t.post_salary) max_salary
FROM demo.t_staff t
WHERE t.dept_code IN('010102','010103')
GROUP BY t.dept_code,CUBE(t.post_code);

Result:

DEPT_CODE                                    POST_CODE                                SUM_SALARY MAX_SALARY
-------------------------------------------- ---------------------------------------- ---------- ----------
010102                                                                                     13500       7500
010102                                       P40                                            7500       7500
010102                                       P50                                            6000       6000
010103                                                                                      7850       5050
010103                                       P40                                            5050       5050
010103                                       P50                                            2800       2800

GROUPING SETS grouping

GROUPING SETS is an extension of the GROUP BY clause, which allows the subtotal of multiple standard groups to be calculated at one time. Syntax:

GROUP BY GROUPING SETS(grouping_column_reference_list)

The calculation results of GROUPING SETS have nothing to do with the order of grouping columns, and the ranking of result sets has nothing to do with the order of grouping columns. When n columns are specified in GROUPING SETS, there are n grouping methods in the whole calculation process. For example, GROUPING SETS(A,B,C) is equivalent to the results of UNION ALL of GROUP BY A, GROUP BY B and GROUP BY C.

Example 1:

SELECT t.dept_code,SUM(t.post_salary) sum_salary,MAX(t.post_salary) max_salary
FROM demo.t_staff t
WHERE t.dept_code IN('010102','010103')
GROUP BY GROUPING SETS(t.dept_code);

Result:

DEPT_CODE                                          SUM_SALARY MAX_SALARY
-------------------------------------------------- ---------- ----------
010102                                                  13500       7500
010103                                                   7850       5050

Example 2:

SELECT t.dept_code,t.post_code,SUM(t.post_salary) sum_salary,MAX(t.post_salary) max_salary
FROM demo.t_staff t
WHERE t.dept_code IN('010102','010103')
GROUP BY GROUPING SETS(t.dept_code,t.post_code);

Result:

DEPT_CODE                                     POST_CODE                                  SUM_SALARY MAX_SALARY
--------------------------------------------- ------------------------------------------ ---------- ----------
010102                                                                                        13500       7500
010103                                                                                         7850       5050
                                              P50                                              8800       6000
                                              P40                                             12550       7500

Example 3, partial GROUPING SETS grouping:

SELECT t.dept_code,t.post_code,SUM(t.post_salary) sum_salary,MAX(t.post_salary) max_salary
FROM demo.t_staff t
WHERE t.dept_code IN('010102','010103')
GROUP BY t.dept_code,GROUPING SETS(t.post_code);

Result:

DEPT_CODE                                    POST_CODE                                 SUM_SALARY MAX_SALARY
-------------------------------------------- ----------------------------------------- ---------- ----------
010103                                       P40                                             5050       5050
010102                                       P40                                             7500       7500
010102                                       P50                                             6000       6000
010103                                       P50                                             2800       2800

Example 4: GROUPING SETS can accept ROLLUP and CUBE as its parameters; GROUPING SETS only groups single columns, but does not provide aggregate functions. If GROUPING SETS is required to provide aggregate functions, ROLLUP or CUBE can be used as parameters. (Note that ROLLUP and CUBE do not accept GROUPING SETS as parameters, nor do ROLLUP and CUBE take each other as parameters.) Yes):

SELECT t.dept_code,t.post_code,SUM(t.post_salary) sum_salary,MAX(t.post_salary) max_salary
FROM demo.t_staff t
WHERE t.dept_code IN('010102','010103')
GROUP BY GROUPING SETS(ROLLUP(t.dept_code),ROLLUP(t.post_code));

Result:

DEPT_CODE                                    POST_CODE                                  SUM_SALARY MAX_SALARY
-------------------------------------------- ------------------------------------------ ---------- ----------
010102                                                                                       13500       7500
010103                                                                                        7850       5050
                                             P50                                              8800       6000
                                             P40                                             12550       7500
                                                                                             21350       7500
                                                                                             21350       7500

2.4. Complex grouping (combination column grouping, duplicate column grouping, connection grouping)

Simply put: Combining column grouping allows multiple columns or column combinations in ROLLUP, CUBE and GROUPING SETS; repeating column grouping allows group columns to repeat after GROUP BY; connecting grouping allows multiple ROLLUP, CUBE or GROUPING SETS after GROUP BY.

Combination column grouping has the function of filtering some subtotals or calculating some additional subtotals. The former part of ROLLUP and part of CUBE are not totaled. Using combination column grouping can not only realize the function of part of ROLLUP or part of CUBE, but also can totalize. For example, ROLLUP(A,(B,C)) can filter the subtotals of B and C and calculate the total of AB C.

The grouping level of connection grouping is a Cartesian product of the levels of all ROLLUP, CUBE or GROUPING SETS groupings. For example, ROLLUP(A,B),ROLLUP(C,D,E) has a grouping level of (2 + 1) * (3 + 1) = 12, CUBE(A,B),CUBE(C,D,E) has a grouping level of (4) * (8) = 32, CUBE(A,B),CUBE(C,D,E) has a grouping level of (4) * (8) = 32.

In my understanding, complex grouping is nothing more than the comprehensive application of standard extended grouping. In practical development, we may encounter some requirements that can not be achieved only by using standard extended grouping. At this time, we can consider flexible use of standard extended grouping, through complex grouping.

3. GROUP BY Extension Function

3.1. GROUPING Function

GROUPING Grammar: GROUPING(expr). Because there may be NULL in the original data, and the subtotal or aggregate value may also be NULL, the data will be confused. When the function appears in the SELECT clause, if the expr of the aggregated row is NULL, it returns 1; if the expr of the regular row is NULL, it returns 0. Usually, a grouping column is taken as a parameter of the function, and then aggregated rows and regular rows are distinguished by judging its return value, so as to further beautify or filter the result set. Example:

SELECT DECODE(GROUPING(t.dept_code),1,'Total',t.dept_code) dept_code,
  SUM(t.post_salary) sum_salary,GROUPING(t.dept_code) gd
FROM demo.t_staff t
WHERE t.dept_code IN('010102','010103')
GROUP BY ROLLUP(t.dept_code);

Result:

DEPT_CODE                                          SUM_SALARY         GD
-------------------------------------------------- ---------- ----------
010102                                                  13500          0
010103                                                   7850          0
//Total 213501

3.2. GROUPING_ID Function

GROUPING_ID Grammar: GROUPING_ID(expr [, expr]...). When the function appears in the SELECT clause, it returns the value corresponding to the GROUPING bit vector associated with the row. The GROUPING_ID function is calculated in the order from left to right. If this column is a grouped column, it is 0. If it is a subtotal or a total, it is 1. Then the result is composed into a binary sequence (bit vector) in the order of columns. Finally, the bit vector is converted into a decimal number. The GROUPING_ID function is functionally equivalent to the result of multiple GROUPING functions. With GROUPING_ID, it is no longer necessary to write multiple GROUPING functions, and the line filtering conditions are easier to express. This function is particularly useful when query results have multiple aggregation levels, and can sort and filter the result set by its return value. Example:

SELECT t.dept_code,t.post_code,SUM(t.post_salary) sum_salary,
  GROUPING_ID(t.dept_code) gd,
  GROUPING_ID(t.post_code) gp,
  GROUPING_ID(t.dept_code,t.post_code) gdp,
  GROUPING_ID(t.post_code,t.dept_code) gpd
FROM demo.t_staff t
WHERE t.dept_code IN('010102','010103')
GROUP BY CUBE(t.dept_code,t.post_code)
ORDER BY GROUPING_ID(t.dept_code,t.post_code);

Result (the result set corresponds to an ascending order of GDP):

DEPT_CODE                     POST_CODE                SUM_SALARY         GD         GP        GDP        GPD
----------------------------- ------------------------ ---------- ---------- ---------- ---------- ----------
010102                        P40                            7500          0          0          0          0
010103                        P40                            5050          0          0          0          0
010102                        P50                            6000          0          0          0          0
010103                        P50                            2800          0          0          0          0
010103                                                       7850          0          1          1          2
010102                                                      13500          0          1          1          2
                              P50                            8800          1          0          2          1
                              P40                           12550          1          0          2          1
                                                            21350          1          1          3          3

3.3. GROUP_ID Function

GROUP_ID Syntax: GROUP_ID(). When the function appears in the SELECT clause, if there are n duplicates in the result set, it will return numbers ranging from 0 to n-1, which is very useful for eliminating duplicate groupings from the query results. Example (typically eliminating all duplicate rows by HAVING GROUP_ID()<1):

SELECT t.dept_code,SUM(t.post_salary) sum_salary,GROUP_ID() group_id
FROM demo.t_staff t
WHERE t.dept_code IN('010102','010103')
GROUP BY ROLLUP(t.dept_code),CUBE(t.dept_code);

Result:

DEPT_CODE                                          SUM_SALARY   GROUP_ID
-------------------------------------------------- ---------- ----------
010102                                                  13500          0
010103                                                   7850          0
010102                                                  13500          2
010103                                                   7850          2
010102                                                  13500          1
010103                                                   7850          1
                                                        21350          0

4, summary

This paper mainly describes GROUP BY related knowledge points in Oracle, such as standard grouping, extended grouping, extended function, etc.

Links to this article: http://www.cnblogs.com/hanzongze/p/Oracle-Group-By.html
Copyright Statement: This article is a blogger of Blog Garden Han Chung se Originality, the author reserves the right of signature! You are welcome to use this article through reprinting, deduction or other means of dissemination, but you must give the author's signature and link to this article in a clear place! My first blog, level is limited, if there are inappropriate, please criticize and correct, thank you!

Posted by andy666 on Fri, 19 Apr 2019 17:06:32 -0700