Explanation of the Analytical Function OVER(PARTITION BY... ORDER BY...)

Instructions and Significance of OVER(PARTITION BY... ORDER BY...)

The function looks like this: OVER(PARTITION BY... ORDER BY...)

As the name implies, PARTITION means segmentation in Chinese and ORDER means sorting, so the translation is to first divide a set of data into groups according to the field formulated, and then sort the data according to a field within the group.

The use of this function is illustrated by a practical case.

First, let's look at a set of ordinarily sorted data:


Then add a fake and inferior ROWNUM to it to see the original order of data:


OVER(ORDER BY...) means sorting by a field, so it has the same effect as order by directly on a table:


The first function of OVER(ORDER BY...) is to renumber the data obtained from the new query, that is, the value of RNO. Because there is no PARTITION BY..., it can be compared to the whole column as a block, and then sort the contents of the block. At this time, add PARTITION BY..., the query results:


It will arrange the same values according to the field, and divide them into a block, that is, grouping, and then sorting the number in the group, so that you can take any number of values in different groups, similar to TOP-N analysis.

The function used before over here is ROW_NUMBER(), which is the number of the data.


Functions used to match OVER(PARTITION BY... ORDER BY...)

row_number() over(), rank() over() and dense_rank() over()

In the above example, data can be numbered by using ROW_NUMBER(), but there is a problem that MI_ID in the example can not be repeated. If it can be repeated, there will be parallelism, so it is impossible to extract the parallelism data, only the data of a single sort can be taken. So we can change it to rank() over() and dense_rank() over() for numbering: (rank() over() and dense_rank() over() as shown in the figure).



Use of sum() over(), first_value() over() and last_value() over()

sum() over() group summation

first_value() over() to find the first item of grouping

last_value() over() Finds the last item of the grouping

The effect of first_value() over() can also be achieved by using row_number() over() as the first item.

SELECT DISTINCT *
  FROM (SELECT T1.MI_ID,
               T3.I_IDENTITY_CARD,
               SUM(NVL(T2.IS_VISHEARTPROMEMBER, 0)) OVER(PARTITION BY T3.I_IDENTITY_CARD) PRO_NUM,
               SUM(NVL(T2.IS_VISHEARTEXPMEMBER, 0)) OVER(PARTITION BY T3.I_IDENTITY_CARD) EXP_NUM,
               SUM(NVL(T2.IS_VISHEARTPOLMEMBER, 0)) OVER(PARTITION BY T3.I_IDENTITY_CARD) POL_NUM,
							 T1.CREATE_TIME,
               FIRST_VALUE(T2.CREATE_DATE) OVER(PARTITION BY T3.I_IDENTITY_CARD ORDER BY T2.CREATE_DATE DESC) FIRST_CREATE_DATE
          FROM T_ZZ_PETITIONERS T1
          LEFT JOIN T_ZZ_VISIT_RECORD T2
            ON T1.CI_RS_ID = T2.CI_RS_ID
           AND T2.STATUS != '003'
          LEFT JOIN T_DC_CI_RS_TOP T3
            ON T1.CI_RS_ID = T3.CI_RS_ID
           AND T3.STATUS = '1'
         WHERE T1.IS_ADD_HEARTPROTECT = '1'
           AND T1.STATUS = '1') C
 ORDER BY C.CREATE_TIME DESC


SELECT *
  FROM (SELECT T1.MI_ID,
               T3.I_IDENTITY_CARD,
               SUM(NVL(T2.IS_VISHEARTPROMEMBER, 0)) OVER(PARTITION BY T3.I_IDENTITY_CARD) PRO_NUM,
               SUM(NVL(T2.IS_VISHEARTEXPMEMBER, 0)) OVER(PARTITION BY T3.I_IDENTITY_CARD) EXP_NUM,
               SUM(NVL(T2.IS_VISHEARTPOLMEMBER, 0)) OVER(PARTITION BY T3.I_IDENTITY_CARD) POL_NUM,
               T1.CREATE_TIME,
               ROW_NUMBER() OVER(PARTITION BY T3.I_IDENTITY_CARD ORDER BY T2.CREATE_DATE DESC) RNO
          FROM T_ZZ_PETITIONERS T1
          LEFT JOIN T_ZZ_VISIT_RECORD T2
            ON T1.CI_RS_ID = T2.CI_RS_ID
           AND T2.STATUS != '003'
          LEFT JOIN T_DC_CI_RS_TOP T3
            ON T1.CI_RS_ID = T3.CI_RS_ID
          LEFT JOIN T_DC_GRID T4
            ON T1.REGION_CODE = T4.INFO_ORG_CODE
         WHERE T1.IS_ADD_HEARTPROTECT = '1'
           AND T1.STATUS = '1'
           AND T3.STATUS = '1'
           AND T4.STATUS = '001') C
 WHERE C.RNO = 1
 ORDER BY C.CREATE_TIME DESC


ignore nulls: filter out null values


The commonly used analysis functions are listed as follows:

row_number() over(partition by ... order by ...)
rank() over(partition by ... order by ...)
dense_rank() over(partition by ... order by ...)
count() over(partition by ... order by ...)
max() over(partition by ... order by ...)
min() over(partition by ... order by ...)
sum() over(partition by ... order by ...)
avg() over(partition by ... order by ...)
first_value() over(partition by ... order by ...)
last_value() over(partition by ... order by ...)
lag() over(partition by ... order by ...)
lead() over(partition by ... order by ...)







Posted by jordan on Sat, 05 Jan 2019 15:27:09 -0800