Panda 13 - window function (rolling, expanding)
Article directory
Reprinted and adapted from:
https://www.yiibai.com/pandas/python_pandas_window_functions.html
1, About window functions
Window function is mainly used to find the trend in the data graphically by smoothing the curve.
Pandas provides several variations, such as scrolling, expanding, and exponentially moving window statistics for weight.
It includes sum, mean, median, variance, covariance, correlation, etc.
Let's learn how to apply each of the methods mentioned on the DataFrame object.
2, Using window functions on DataFrame
1,.rolling()
Method structure:
DataFrame.rolling(window, min_periods=None, center=False, win_type=None, on=None, axis=0, closed=None)
Parameter description
parameter | describe |
---|---|
window | Represents the size of the time window in two forms: 1) If the value int is used, the number of observations, i.e. forward data, will be represented; 2) You can also use the offset type, which is more complex and uses fewer scenarios. I will not introduce it here; |
min_periods | The minimum number of observations per window is NA. Value can be int, default None. In the case of offset, the default value is 1; |
center | Set window label to center, Boolean, default False, right |
win_type | The type of window. Intercepts various functions of the window. String type, default to None; |
on | Optional parameters. For dataframe, specify the columns to calculate the scrolling window. The value is the column name. |
axis | The default value is 0, which means the column is calculated |
closed | ca defines the opening and closing of interval, and supports window s of type int. For the offset type, the default is left open right. You can specify left, both, and so on as appropriate. |
This function can be applied to a series of data. Specify the window=n parameter and apply the appropriate statistical function to it.
When the window size is 3(window), the first two elements have empty values, and the value of the third element will be the average of N, n-1 and n-2 elements. In this way, the functions mentioned above can also be applied.
df = pd.DataFrame([[1,3,5,7],[2,4,6,8], [11,18,13,21],[12,34,28,76]], columns = ['A', 'B', 'C', 'D']) print (df.rolling(window=1).mean()) ''' A B C D 0 1.0 3.0 5.0 7.0 1 2.0 4.0 6.0 8.0 2 11.0 18.0 13.0 21.0 3 12.0 34.0 28.0 76.0 ''' print (df.rolling(window=2).mean()) ''' A B C D 0 NaN NaN NaN NaN 1 1.5 3.5 5.5 7.5 2 6.5 11.0 9.5 14.5 3 11.5 26.0 20.5 48.5 ''' print (df.rolling(window=3).mean()) ''' A B C D 0 NaN NaN NaN NaN 1 NaN NaN NaN NaN 2 4.666667 8.333333 8.000000 12.0 3 8.333333 18.666667 15.666667 35.0 '''
Common usage
-
The rolling() function supports many functions besides mean(), such as:
count() number of non null observations
Sum of sum() values
Arithmetic mean value of the value of median()
min() min
max() max
std() Bessel corrected sample standard deviation
var() has no deviation
skew() of sample (third moment)
kurt() sample kurtosis (fourth moment)
quantile() sample quantile (value in percentile)
cov() unbiased covariance (binary)
corr() correlation (binary) -
With the help of the agg () function, multiple clustering functions can be implemented quickly, and the results can be output, and can be renamed at the same time;
For reference: https://www.jianshu.com/p/b8c795345e93
2 .expanding()
Method structure:
DataFrame.expanding(min_periods = 1,center = False,axis = 0)
The parameter of expanding() function is the same as that of rolling();
rolling() function is to fix the window size and perform sliding calculation. expanding() function only sets the minimum number of observation values, and does not fix the window size to achieve cumulative calculation, that is, continuous expansion;
The expanding() function is similar to the cumulative summation of the cumsum() function, which has the advantage that more clustering can be done;
In fact, when the rolling() function has the parameter window=len(df), the effect is the same as the expanding() function.
df = pd.DataFrame([[1,3,5,7],[2,4,6,8], [11,18,13,21],[12,34,28,76]], columns = ['A', 'B', 'C', 'D']) print (df) ''' A B C D 0 1.0 3.0 5.0 7.0 1 2.0 4.0 6.0 8.0 2 11.0 18.0 13.0 21.0 3 12.0 34.0 28.0 76.0 ''' print (df.expanding(min_periods=1).mean()) ''' A B C D 0 1.000000 3.000000 5.0 7.0 1 1.500000 3.500000 5.5 7.5 2 4.666667 8.333333 8.0 12.0 3 6.500000 14.750000 13.0 28.0 ''' print (df.expanding(min_periods=2).mean()) ''' A B C D 0 NaN NaN NaN NaN 1 1.500000 3.500000 5.5 7.5 2 4.666667 8.333333 8.0 12.0 3 6.500000 14.750000 13.0 28.0 ''' print (df.expanding(min_periods=3).mean()) ''' A B C D 0 NaN NaN NaN NaN 1 NaN NaN NaN NaN 2 4.666667 8.333333 8.0 12.0 3 6.500000 14.750000 13.0 28.0 '''
3,.ewm()
ewm() can be applied to a series of data. It represents exponential weighted sliding, with few scenarios.
Specify com, span, halflife parameters, and apply appropriate statistical functions to them. It assigns weights in the form of indices.
df = pd.DataFrame([[1,3,5,7],[2,4,6,8], [11,18,13,21],[12,34,28,76]], columns = ['A', 'B', 'C', 'D']) print (df) ''' A B C D 0 1 3 5 7 1 2 4 6 8 2 11 18 13 21 3 12 34 28 76 ''' print (df.ewm(com=0.5).mean()) ''' A B C D 0 1.000000 3.000000 5.000000 7.000000 1 1.750000 3.750000 5.750000 7.750000 2 8.153846 13.615385 10.769231 16.923077 3 10.750000 27.375000 22.400000 56.800000 '''