Index and slice of Data Science | time series

Keywords: Python

Index and slice of time series
Indexes
The index method of time series is also applicable to Dataframe, and in the time series, because it is sorted according to the time sequence, it is unnecessary to consider the order problem.

The basic location index uses the same method as the list:

from datetime import datetime

rng = pd.date_range('2017/1','2017/3')
ts = pd.Series(np.random.rand(len(rng)), index = rng)
print(ts.head())

print(ts[0])
print(ts[:2])
>>>
2017-01-01    0.107736
2017-01-02    0.887981
2017-01-03    0.712862
2017-01-04    0.920021
2017-01-05    0.317863
Freq: D, dtype: float64
0.107735945027
2017-01-01    0.107736
2017-01-02    0.887981
Freq: D, dtype: float64

In addition to the base location index, there are time series label indexes:

from datetime import datetime

rng = pd.date_range('2017/1','2017/3')
ts = pd.Series(np.random.rand(len(rng)), index = rng)
print(ts['2017/1/2'])
print(ts['20170103'])
print(ts['1/10/2017'])
print(ts[datetime(2017,1,20)])
>>>
0.887980757812
0.712861778966
0.788336674948
0.93070380011

Section
The operation of slicing is mentioned in the basic location index of the index section above, which is the same as that of Series according to the index index index principle, and is also included at the end.

rng = pd.date_range('2017/1','2017/3',freq = '12H')
ts = pd.Series(np.random.rand(len(rng)), index = rng)
print(ts['2017/1/5':'2017/1/10'])
>>>
2017-01-05 00:00:00    0.462085
2017-01-05 12:00:00    0.778637
2017-01-06 00:00:00    0.356306
2017-01-06 12:00:00    0.667964
2017-01-07 00:00:00    0.246857
2017-01-07 12:00:00    0.386956
2017-01-08 00:00:00    0.328203
2017-01-08 12:00:00    0.260853
2017-01-09 00:00:00    0.224920
2017-01-09 12:00:00    0.397457
2017-01-10 00:00:00    0.158729
2017-01-10 12:00:00    0.501266
Freq: 12H, dtype: float64


# Here we can pass in the month and get the slice of the whole month directly
print(ts['2017/2'].head())
>>>
2017-02-01 00:00:00    0.243932
2017-02-01 12:00:00    0.220830
2017-02-02 00:00:00    0.896107
2017-02-02 12:00:00    0.476584
2017-02-03 00:00:00    0.515817
Freq: 12H, dtype: float64

Time series of duplicate indexes

dates = pd.DatetimeIndex(['1/1/2015','1/2/2015','1/3/2015','1/4/2015','1/1/2015','1/2/2015'])
ts = pd.Series(np.random.rand(6), index = dates)
print(ts)
# We can check whether the value or index is repeated through is unique
print(ts.is_unique,ts.index.is_unique)
>>>
2015-01-01    0.300286
2015-01-02    0.603865
2015-01-03    0.017949
2015-01-04    0.026621
2015-01-01    0.791441
2015-01-02    0.526622
dtype: float64
True False

According to the above results, it can be seen that in the above time series, there is a case where the index (ts.index. Is'unique) is repeated but the value (ts.is'unique) is not repeated.

We can solve the problem of index duplication by averaging the corresponding values of duplicate indexes in time series:

print(ts.groupby(level = 0).mean())
# Group through groupby. Repeat values are processed with average values
>>>
2015-01-01    0.545863
2015-01-02    0.565244
2015-01-03    0.017949
2015-01-04    0.026621
dtype: float64

The original release time is: December 17, 2018
The author of this paper: the salt fish of Huangjin
This article comes from yunqi community partners“ Salted fish Plath ”, you can pay attention to“
xianyuplus1995 WeChat public address

Posted by damien@damosworld.com on Sun, 01 Dec 2019 14:56:09 -0800