Panda 13 - window function (rolling, expanding)

Panda 13 - window function (rolling, expanding)

Article directory

Reprinted and adapted from:

https://www.yiibai.com/pandas/python_pandas_window_functions.html

1, About window functions

Window function is mainly used to find the trend in the data graphically by smoothing the curve.

Pandas provides several variations, such as scrolling, expanding, and exponentially moving window statistics for weight.

It includes sum, mean, median, variance, covariance, correlation, etc.

Let's learn how to apply each of the methods mentioned on the DataFrame object.

2, Using window functions on DataFrame

1,.rolling()

Method structure:

DataFrame.rolling(window, min_periods=None, center=False, win_type=None, 
                  on=None, axis=0, closed=None)

Parameter description

parameter describe
window Represents the size of the time window in two forms:
1) If the value int is used, the number of observations, i.e. forward data, will be represented;
2) You can also use the offset type, which is more complex and uses fewer scenarios. I will not introduce it here;
min_periods The minimum number of observations per window is NA. Value can be int, default None. In the case of offset, the default value is 1;
center Set window label to center, Boolean, default False, right
win_type The type of window. Intercepts various functions of the window. String type, default to None;
on Optional parameters. For dataframe, specify the columns to calculate the scrolling window. The value is the column name.
axis The default value is 0, which means the column is calculated
closed ca defines the opening and closing of interval, and supports window s of type int. For the offset type, the default is left open right. You can specify left, both, and so on as appropriate.

This function can be applied to a series of data. Specify the window=n parameter and apply the appropriate statistical function to it.

When the window size is 3(window), the first two elements have empty values, and the value of the third element will be the average of N, n-1 and n-2 elements. In this way, the functions mentioned above can also be applied.

df = pd.DataFrame([[1,3,5,7],[2,4,6,8],
                   [11,18,13,21],[12,34,28,76]],
                  columns = ['A', 'B', 'C', 'D'])

print (df.rolling(window=1).mean())
'''
      A     B     C     D
0   1.0   3.0   5.0   7.0
1   2.0   4.0   6.0   8.0
2  11.0  18.0  13.0  21.0
3  12.0  34.0  28.0  76.0
'''


print (df.rolling(window=2).mean())
'''
      A     B     C     D
0   NaN   NaN   NaN   NaN
1   1.5   3.5   5.5   7.5
2   6.5  11.0   9.5  14.5
3  11.5  26.0  20.5  48.5
'''

print (df.rolling(window=3).mean())
'''
          A          B          C     D
0       NaN        NaN        NaN   NaN
1       NaN        NaN        NaN   NaN
2  4.666667   8.333333   8.000000  12.0
3  8.333333  18.666667  15.666667  35.0
'''


Common usage

  • The rolling() function supports many functions besides mean(), such as:
    count() number of non null observations
    Sum of sum() values
    Arithmetic mean value of the value of median()
    min() min
    max() max
    std() Bessel corrected sample standard deviation
    var() has no deviation
    skew() of sample (third moment)
    kurt() sample kurtosis (fourth moment)
    quantile() sample quantile (value in percentile)
    cov() unbiased covariance (binary)
    corr() correlation (binary)

  • With the help of the agg () function, multiple clustering functions can be implemented quickly, and the results can be output, and can be renamed at the same time;

For reference: https://www.jianshu.com/p/b8c795345e93

2 .expanding()

Method structure:

DataFrame.expanding(min_periods = 1,center = False,axis = 0)

The parameter of expanding() function is the same as that of rolling();

rolling() function is to fix the window size and perform sliding calculation. expanding() function only sets the minimum number of observation values, and does not fix the window size to achieve cumulative calculation, that is, continuous expansion;

The expanding() function is similar to the cumulative summation of the cumsum() function, which has the advantage that more clustering can be done;

In fact, when the rolling() function has the parameter window=len(df), the effect is the same as the expanding() function.

df = pd.DataFrame([[1,3,5,7],[2,4,6,8],
                   [11,18,13,21],[12,34,28,76]],
                  columns = ['A', 'B', 'C', 'D'])

print (df)
'''
      A     B     C     D
0   1.0   3.0   5.0   7.0
1   2.0   4.0   6.0   8.0
2  11.0  18.0  13.0  21.0
3  12.0  34.0  28.0  76.0
'''

print (df.expanding(min_periods=1).mean())
'''
          A          B     C     D
0  1.000000   3.000000   5.0   7.0
1  1.500000   3.500000   5.5   7.5
2  4.666667   8.333333   8.0  12.0
3  6.500000  14.750000  13.0  28.0
'''

print (df.expanding(min_periods=2).mean())
'''
          A          B     C     D
0       NaN        NaN   NaN   NaN
1  1.500000   3.500000   5.5   7.5
2  4.666667   8.333333   8.0  12.0
3  6.500000  14.750000  13.0  28.0
'''

print (df.expanding(min_periods=3).mean())
'''
          A          B     C     D
0       NaN        NaN   NaN   NaN
1       NaN        NaN   NaN   NaN
2  4.666667   8.333333   8.0  12.0
3  6.500000  14.750000  13.0  28.0
'''

3,.ewm()

ewm() can be applied to a series of data. It represents exponential weighted sliding, with few scenarios.

Specify com, span, halflife parameters, and apply appropriate statistical functions to them. It assigns weights in the form of indices.

df = pd.DataFrame([[1,3,5,7],[2,4,6,8],
                   [11,18,13,21],[12,34,28,76]],
                  columns = ['A', 'B', 'C', 'D'])

print (df)
'''
    A   B   C   D
0   1   3   5   7
1   2   4   6   8
2  11  18  13  21
3  12  34  28  76
'''

print (df.ewm(com=0.5).mean())
'''
           A          B          C          D
0   1.000000   3.000000   5.000000   7.000000
1   1.750000   3.750000   5.750000   7.750000
2   8.153846  13.615385  10.769231  16.923077
3  10.750000  27.375000  22.400000  56.800000
'''

Published 15 original articles, won praise 1, visited 1593
Private letter follow

Posted by ingoruberg on Sun, 26 Jan 2020 23:32:24 -0800