Offset analysis function of MySQL advanced application window function

Keywords: Java MySQL Back-end Interview Programmer

  preface

This article will explain the offset analysis function + over() through an example

1, Concept of offset analysis function

  • lag(col,n,default): used to count the value of the nth row up in the group.

    • The first parameter is the column name
    • The second parameter is the nth line up (optional, not filled in, default to 1)
    • The third parameter is the default value (when the nth line up is NULL, the default value is taken; if it is not specified, it is NULL)
  • lead(col,n,default): Contrary to lag, it counts the value of the nth row down in the group.

    • The first parameter is the column name
    • The second parameter is the next n line (optional, not filled in, default to 1)
    • The third parameter is the default value (when the next nth line is NULL, the default value is taken; if it is not specified, it is NULL)
  • first_value(col): used to get the value of the first col as of the current row after sorting in the group.

  • last_value(col): used to get the value of the last col as of the current row after sorting in the group.

2, Example explanation

2.1 cases

Query the time offset of users [yantian] and [lisi] (the first N lines)

Analysis: implemented by lag() window function

SELECT
	user_name,
	pay_time,
	lag( pay_time, 1, pay_time ) over ( PARTITION BY user_name ORDER BY pay_time ) lag1,
	lag( pay_time ) over ( PARTITION BY user_name ORDER BY pay_time ) lag2,
	lag( pay_time, 2, pay_time ) over ( PARTITION BY user_name ORDER BY pay_time ) lag3,
	lag( pay_time, 2 ) over ( PARTITION BY user_name ORDER BY pay_time ) lag4 
FROM
	user_order 
WHERE
	user_name IN ( 'lisi', 'yantian' )
Copy code

Operation results

2.2 cases

Query the time offset of users [yantian] and [lisi] (the last N lines)

Analysis: implemented through the lead() window function

SELECT
	user_name,
	pay_time,
	lead( pay_time, 1, pay_time ) over ( PARTITION BY user_name ORDER BY pay_time ) lead1,
	lead( pay_time ) over ( PARTITION BY user_name ORDER BY pay_time ) lead2,
	lead( pay_time, 2, pay_time ) over ( PARTITION BY user_name ORDER BY pay_time ) lead3,
	lead( pay_time, 2 ) over ( PARTITION BY user_name ORDER BY pay_time ) lead4 
FROM
	user_order 
WHERE
	user_name IN ( 'lisi', 'yantian' );
Copy code

Operation results

2.3 cases

Query the number of users whose payment interval exceeds 10 days

analysis:

  • For the same user, the ordering time of adjacent orders is compared. If the ordering time interval of adjacent orders exceeds 10 days, this user needs to be counted
  • In this process, the payment time of adjacent orders needs to be subtracted

Implementation steps

(1)adopt lead Window functions are grouped according to payment time,Move the payment time of the next line to the previous line
(2)Make the payment time difference through the previous step,Then the user is de duplicated, and then count operation
 Copy code

realization

Step 1: group according to the payment time through the lead window function, and move the payment time of the next line to the previous line

SELECT
	user_name,
	pay_time,
	lead( pay_time, 1 ) over ( PARTITION BY user_name ORDER BY pay_time ) lead_time 
FROM
	user_order;
Copy code

Step 2: make a difference in the payment time through the previous step, then de duplicate the user, and then count

SELECT
	count( DISTINCT user_name ) 
FROM
	( SELECT user_name, pay_time, lead( pay_time, 1 ) over ( PARTITION BY user_name ORDER BY pay_time ) lead_time FROM user_order ) a 
WHERE
	datediff( a.lead_time, a.pay_time )> 10;
Copy code

2.4 cases

Query the users with the longest annual payment interval

Implementation steps

(1)Grouped by user and payment year lag()The window function merges the payment time of the previous line to the next line
(2)Calculate order interval,Rank according to the time interval of each order within the year
(3)Sub query the query results above as a table,Get a value with a ranking result of 1
 Copy code

realization

Step 1: group according to the user and payment year, and merge the payment time of the previous line to the next line through the lag() window function

SELECT YEAR
	( pay_time ) year_time,
	user_name,
	pay_time,
	lag( pay_time ) over ( PARTITION BY user_name, YEAR ( pay_time ) ORDER BY pay_time ASC ) lag_time 
FROM
	user_order;
Copy code

Step 2: calculate the order time interval and rank them according to their order time interval in the year

SELECT
	a.year_time,
	a.user_name,
	datediff( a.pay_time, a.lag_time ) interval_days,
	rank() over ( PARTITION BY a.year_time ORDER BY datediff( a.pay_time, a.lag_time ) DESC ) rank1 
FROM
	(
	SELECT YEAR
		( pay_time ) year_time,
		user_name,
		pay_time,
		lag( pay_time ) over ( PARTITION BY user_name, YEAR ( pay_time ) ORDER BY pay_time ASC ) lag_time 
	FROM
		user_order 
	) a;
Copy code

Step 3: use the query result above as a sub query table to obtain the value with the ranking result of 1

SELECT
	b.year_time,
	b.user_name,
	b.interval_days 
FROM
	(
	SELECT
		a.year_time,
		a.user_name,
		datediff( a.pay_time, a.lag_time ) interval_days,
		rank() over ( PARTITION BY a.year_time ORDER BY datediff( a.pay_time, a.lag_time ) DESC ) rank1 
	FROM
		(
		SELECT YEAR
			( pay_time ) year_time,
			user_name,
			pay_time,
			lag( pay_time ) over ( PARTITION BY user_name, YEAR ( pay_time ) ORDER BY pay_time ASC ) lag_time 
		FROM
			user_order 
		) a 
	) b
	where b.rank1 = 1;
Copy code

Operation results



Link: https://juejin.cn/post/7035169694529110024

Posted by rhosk on Sun, 28 Nov 2021 16:41:27 -0800