time data 2018-05-01 00:00:00.650 57 2018-05-01 00:00:01.990 54 2018-05-01 00:00:09.487 73 2018-05-01 00:00:14.607 95 2018-05-01 00:00:16.350 77 2018-05-01 00:00:16.397 28 2018-05-01 00:00:16.563 54 2018-05-01 00:00:25.457 19 2018-05-01 00:00:31.140 09 2018-05-01 00:00:54.427 18 2018-05-01 00:00:55.387 39 2018-05-01 00:01:02.193 97 2018-05-01 00:01:07.447 39 2018-05-01 00:01:09.020 41 2018-05-01 00:01:11.033 93 2018-05-01 00:01:25.693 42 2018-05-01 00:02:03.900 42 2018-05-01 00:02:04.190 84 2018-05-01 00:02:05.727 39 2018-05-01 00:02:10.910 40
Now there is a df data as shown above__
Now you need to group by date and hour, so you can index the time column first.
df.set_index("time", inpalce=True)
The inplace parameter indicates that the time column of the DF has become an index when it is modified on the original data box, which can be verified with df.index. I won't go into details here.
The reason why event columns should be indexed is that they are grouped according to dates and hours later, which makes it easier to get values. It will be used next.
groupby function of pandas is used to group:
dp = df.groupby([lambda x:x.day, lambda x:x.hour, "data"])
At this point, dp is a GroupBy object, without any calculation, but it has all the information needed to perform the next operation on each group. GroupBy can be calculated later, such as sum (), mean (), size (), agg(), and so on.
For the next step, we add a size() method after the above line of code.
dp = df.groupby([lambda x:x.day, lambda x:x.hour, "data"]).size()
At this time, the dp can be calculated and displayed. As follows__
data 1 0 18 1 19 2 28 1 39 3 40 1 41 1 42 2 54 2 57 1 73 1 77 1 84 1 93 1 95 1 97 1 dtype: int64
Here is a list of the number of times the last column of statistics is saved:
new_dp = dp.reset_index(name="times")
new_dp is a newly generated DataFrame:
level_0 level_1 data times 0 1 0 18 1 1 1 0 19 2 2 1 0 28 1 3 1 0 39 3 4 1 0 40 1 5 1 0 41 1 6 1 0 42 2 7 1 0 54 2 8 1 0 57 1 9 1 0 73 1 10 1 0 77 1 11 1 0 84 1 12 1 0 93 1 13 1 0 95 1 14 1 0 97 1
The level_0 tag is the date, which is the 1st May.
The level_1 column is an hour, because every day starts at 0 o'clock, so it's 0.
Data column is the original data column data of df, but it is also grouped.
The times column is the number of occurrences per hour after grouping data columns.