Preface
A few days ago, I developed a data statistics applet for others. It needs to count the number of days returned according to the stock in and stock out time. Their statistical requirements do not count the time, just look at the date, but their original data is exported from the company's system. By default, it has time. The result of the statistics is that the number of days returned from the stock out on the same day is different! The problem is with pandas.date'range.
Introduction to panda.date/range
Pandas.date'range official document: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.date_range.html
Syntax:
pandas.date_range(start=None, end=None, periods=None, freq=None, tz=None, normalize=False, name=None, closed=None, **kwargs) → pandas.core.indexes.datetimes.DatetimeIndex
Note: start=None, end=None, periods=None although these three parameters have default values, two of them must be filled in, otherwise an error will be reported!
Parameter Description:
start: str or datetime, date left border
end: str or datetime, date right boundary
periods: int, fixed period, integer or None
freq: Date offset, the value is str or DateOffset, and the default value is' D ', Value reference document ; values can be multiple, such as 5H, 1D10min/1D10T
tz: str or tzinfo; returns the time zone name of the localized time index object, such as' Asia/Hong_Kong '. By default, the generated time index object is time zone independent
normalize: bool, the default is False; if the value is True, the start / end date will be normalized to midnight before the date range is generated
Name: str, the default is None; the name of the time index object generated, the value is string or None
Closed: {None, 'left', 'right'}, three options, the default is None (both sides are closed), closed = 'left' left 'left close right, closed='right 'left open right
**kwargs: no effect on results for compatibility
For example:
import pandas as pd In [1]: pd.date_range('2011-01-03', periods=5, freq='B') Out[1]: DatetimeIndex(['2011-01-03', '2011-01-04', '2011-01-05', '2011-01-06', '2011-01-07'], dtype='datetime64[ns]', freq='B') In [2]: pd.date_range('2011-01-03', periods=5, freq=pd.offsets.BDay()) Out[2]: DatetimeIndex(['2011-01-03', '2011-01-04', '2011-01-05', '2011-01-06', '2011-01-07'], dtype='datetime64[ns]', freq='B') In [3]: pd.date_range('2011-01-01 00:00:00', periods=10, freq='2h20min') Out[3]: DatetimeIndex(['2011-01-01 00:00:00', '2011-01-01 02:20:00', '2011-01-01 04:40:00', '2011-01-01 07:00:00', '2011-01-01 09:20:00', '2011-01-01 11:40:00', '2011-01-01 14:00:00', '2011-01-01 16:20:00', '2011-01-01 18:40:00', '2011-01-01 21:00:00'], dtype='datetime64[ns]', freq='140T') In [4]: pd.date_range('2011-01-01 00:00:00', periods=10, freq='1D10U') Out[4]: DatetimeIndex(['2011-01-01 00:00:00', '2011-01-02 00:00:00.000010', '2011-01-03 00:00:00.000020', '2011-01-04 00:00:00.000030', '2011-01-05 00:00:00.000040', '2011-01-06 00:00:00.000050', '2011-01-07 00:00:00.000060', '2011-01-08 00:00:00.000070', '2011-01-09 00:00:00.000080', '2011-01-10 00:00:00.000090'], dtype='datetime64[ns]', freq='86400000010U') In [8]: pd.date_range(start='1/1/2018', periods=5, tz='Asia/Hong_Kong') Out[8]: DatetimeIndex(['2018-01-01 00:00:00+08:00', '2018-01-02 00:00:00+08:00', '2018-01-03 00:00:00+08:00', '2018-01-04 00:00:00+08:00', '2018-01-05 00:00:00+08:00'], dtype='datetime64[ns, Asia/Hong_Kong]', freq='D') In [9]: pd.date_range(start='2017-01-01', end='2017-01-04', closed=None) Out[9]: DatetimeIndex(['2017-01-01', '2017-01-02', '2017-01-03', '2017-01-04'], dtype='datetime64[ns]', freq='D') In [10]: pd.date_range(start='2017-01-01', end='2017-01-04', closed='left') Out[10]: DatetimeIndex(['2017-01-01', '2017-01-02', '2017-01-03'], dtype='datetime64[ns]', freq='D') In [11]: pd.date_range(start='2017-01-01', end='2017-01-04', closed='right') Out[11]: DatetimeIndex(['2017-01-02', '2017-01-03', '2017-01-04'], dtype='datetime64[ns]', freq='D')
Date closed pit
In the last three examples above, we use closed = 'right' closed = 'left' closed=None to demonstrate three closure situations. No pit is found, but the problem lies in the date format. The date format of the demonstration is "yyyy MM DD", but we often encounter "yyyy MM DD HH: mm: SS" in our application. OK, let's take another example:
In [12]: pd.date_range(start='2020-01-17 09:00:00', end='2020-01-19 08:59:59', closed=None) Out[12]: DatetimeIndex(['2020-01-17 09:00:00', '2020-01-18 09:00:00'], dtype='datetime64[ns]', freq='D')
Why? closed=None is closed on both sides. The return date should include 2020-01-19? Try closing right to see if it works better:
In [13]: pd.date_range(start='2020-01-17 09:00:00', end='2020-01-19 08:59:59', closed=None) Out[13]: DatetimeIndex(['2020-01-18 09:00:00'], dtype='datetime64[ns]', freq='D')
Why? Right closure 2020-01-17 is not normal, still not 2020-01-19? Right closing is not closed???
Actually, it's not that the right side is not closed. Let's try again:
In [14]: pd.date_range(start='2020-01-17 09:00:00', end='2020-01-19 09:00:00', closed=None) Out[14]: DatetimeIndex(['2020-01-18 09:00:00', '2020-01-19 09:00:00'], dtype='datetime64[ns]', freq='D')
It's closed normally!! Careful friends can see that the start time is 09:00:00, and the reason for not closing is that the end time is 08:59:59, and it will be closed if it is greater than 09:00:00! The premise of the original closure is to meet 24 hours!!!
Solution:
In fact, it's very simple. Only when you know the problem, you can use the parameter normalize = True when using pd.date_range, for example:
In [15]: pd.date_range(start='2020-01-17 09:00:00', end='2020-01-19 08:59:59', normalize=True,closed=None) Out [15]: DatetimeIndex(['2020-01-17', '2020-01-18', '2020-01-19'], dtype='datetime64[ns]', freq='D')