Definition:
DataFrame is a two-dimensional, variable-size, mixed-component table data structure with labeled coordinate axes (rows and columns). Calculate based on row and column labels. It can be regarded as a dictionary-like container for serial objects, and is the main data structure in pandas.
Form:
class pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False)
Parametric implications:
Data: numpy ndarray (multidimensional array) (structured or homogeneous), dict (dictionary), or DataFrame (data table) If it is a dictionary type, the dictionary can contain sequence, array, constant or list-type objects.
Index: Index or array-like index or array type. If there is no index information in the input data and no index is provided, the default assignment is arange(n), which is an array of equals starting from 0.
Columns: Index or array-like index or array type, an array of equal differences starting at 0 when column labels are not provided
Dtype: dtype, default None data type, empty by default. Only one data type is allowed, and if it is null, the type is automatically inferred.
Copy: boolean, default False Boolean type, default False. Copying data from input values only affects when the input is a DataFrame or a two-dimensional array
Other ways to build DataFrame types:
classmethod DataFrame.from_records(data, index=None, exclude=None, columns=None, coerce_float=False, nrows=None)[source] classmethod DataFrame.from_dict(data, orient='columns', dtype=None)
pandas.read_csv, pandas.read_table, pandas.read_clipboard,pandas.read_excel etc.
For example:
From dictionary construction DataFrame
>>> d = {'col1': [1, 2], 'col2': [3, 4]} >>> df = pd.DataFrame(data=d) >>> df col1 col2 0 1 3 1 2 4
The inference type is int64
>>> df.dtypes col1 int64 col2 int64 dtype: object
Mandatory setting to a single type >>> df = pd.DataFrame(data=d, dtype=np.int8) >>> df.dtypes col1 int8 col2 int8 dtype: object
from numpy Construction of Multidimensional Array Types DataFrame >>> df2 = pd.DataFrame(np.random.randint(low=0, high=10, size=(5, 5)), ... columns=['a', 'b', 'c', 'd', 'e']) >>> df2 a b c d e 0 2 8 8 3 4 1 4 2 9 0 9 2 1 0 7 8 0 3 5 1 7 1 3 4 6 0 2 4 2
Properties:
Get and create the DataFrame
1 import pandas as pd 2 import numpy as np 3 4 df=pd.read_excel('Bank of Nanjing.xlsx',index_col='Date') 5 df1=df[:5] 6 7 In [38]:df1.head() 8 Out[38]: 9 Open High Low Close Trunover Volume 10 Date 11 2017-09-15 8.06 8.08 8.03 8.04 195.43 24272800 12 2017-09-18 8.05 8.13 8.03 8.06 200.76 24867600 13 2017-09-19 8.03 8.06 7.94 8.00 433.76 54253100 14 2017-09-20 7.97 8.06 7.95 8.03 319.94 39909700 15 2017-09-21 8.02 8.10 7.99 8.04 241.94 30056600
- Transpose, after which the index attribute value of the DataFrame is None
1 In [39]: df1.T 2 Out[39]: 3 Date 2017-09-15 2017-09-18 2017-09-19 2017-09-20 2017-09-21 4 Open 8.06 8.05 8.03 7.97 8.02 5 High 8.08 8.13 8.06 8.06 8.10 6 Low 8.03 8.03 7.94 7.95 7.99 7 Close 8.04 8.06 8.00 8.03 8.04 8 Trunover 195.43 200.76 433.76 319.94 241.94 9 Volume 24272800.00 24867600.00 54253100.00 39909700.00 30056600.00
Fast tag-based access
In [35]: date=pd.to_datetime('2017-09-15') In [36]: date Out[36]: Timestamp('2017-09-15 00:00:00') In [37]: df1.at[date,'Open'] Out[37]: 8.0600000000000005
Get the label names for row and column axes
1 In [44]: df1.axes 2 Out[44]: 3 [DatetimeIndex(['2017-09-15', '2017-09-18', '2017-09-19', '2017-09-20', 4 '2017-09-21'], dtype='datetime64[ns]', name='Date', freq=None), 5 Index(['Open', 'High', 'Low', 'Close', 'Trunover', 'Volume'], dtype='object')]
-- Built-in properties
1 In[45]: df1.blocks 2 Out[45]: 3 {'float64': Open High Low Close Trunover 4 Date 5 2017-09-15 8.06 8.08 8.03 8.04 195.43 6 2017-09-18 8.05 8.13 8.03 8.06 200.76 7 2017-09-19 8.03 8.06 7.94 8.00 433.76 8 2017-09-20 7.97 8.06 7.95 8.03 319.94 9 2017-09-21 8.02 8.10 7.99 8.04 241.94, 10 'int64': Volume 11 Date 12 2017-09-15 24272800 13 2017-09-18 24867600 14 2017-09-19 54253100 15 2017-09-20 39909700 16 2017-09-21 30056600}
-- Column data types
1 In[46]: df1.dtypes 2 Out[46]: 3 Open float64 4 High float64 5 Low float64 6 Close float64 7 Trunover float64 8 Volume int64 9 dtype: object
Determine whether the DataFrame is completely empty
1 In [47]: df1.empty 2 Out[47]: False
- Returns sparse or dense tags and data types
1 In[48]: df1.ftypes 2 Out[48]: 3 Open float64:dense 4 High float64:dense 5 Low float64:dense 6 Close float64:dense 7 Trunover float64:dense 8 Volume int64:dense 9 dtype: object
Fast integer scalar positioning (to specific elements, equivalent to giving coordinates)
1 In[49]: df1.iat[0,1] #Line 1, column 2 2 Out[49]: 8.0800000000000001 3 4 In[50]: df1.iat[1,0] #Line 2, column 1 5 Out[50]: 8.0500000000000007
Integer-based positioning index (slice) for location selection
1 In [2]: df1.iloc[0:1] 2 Out[2]: 3 Open High Low Close Trunover Volume 4 Date 5 2017-09-15 8.06 8.08 8.03 8.04 195.43 24272800
1 In [3]: df1.iloc[0:1,2:] 2 Out[3]: 3 Low Close Trunover Volume 4 Date 5 2017-09-15 8.03 8.04 195.43 24272800
- Hybrid positioning (based on integer positions or label names and their combinations, you can only use row labels, but not column labels)
1 In [6]: df1.ix[1,'Open'] 2 Out[6]: 8.0500000000000007
1 In [7]: df1.ix[1] 2 Out[7]: 3 Open 8.05 4 High 8.13 5 Low 8.03 6 Close 8.06 7 Trunover 200.76 8 Volume 24867600.00 9 Name: 2017-09-18 00:00:00, dtype: float64
- Location-based index based on label name
1 In[7]: df1.loc[date,'Low'] 2 Out[7]: 8.0299999999999994 3 4 In [8]: df1.loc[df1.index[0],'Low'] 5 Out[8]: 8.0299999999999994
- Number of coordinate axes
1 In [10]: df1.ndim 2 Out[10]: 2
--The shape of the DataFrame (number of rows and columns)
1 In [11]: df1.shape 2 Out[11]: (5, 6)
--The size of the DataFrame (number of elements)
1 In [12]: df1.size 2 Out[12]: 30
-- Returns the DataFrame style object
1 In [13]: df1.style 2 Out[13]: <pandas.io.formats.style.Styler at 0x1c410cf8eb8>
-- Returns the values in the DataFrame (two-dimensional arrays)
1 In [14]: df1.values 2 Out[14]: 3 array([[ 8.06000000e+00, 8.08000000e+00, 8.03000000e+00, 4 8.04000000e+00, 1.95430000e+02, 2.42728000e+07], 5 [ 8.05000000e+00, 8.13000000e+00, 8.03000000e+00, 6 8.06000000e+00, 2.00760000e+02, 2.48676000e+07], 7 [ 8.03000000e+00, 8.06000000e+00, 7.94000000e+00, 8 8.00000000e+00, 4.33760000e+02, 5.42531000e+07], 9 [ 7.97000000e+00, 8.06000000e+00, 7.95000000e+00, 10 8.03000000e+00, 3.19940000e+02, 3.99097000e+07], 11 [ 8.02000000e+00, 8.10000000e+00, 7.99000000e+00, 12 8.04000000e+00, 2.41940000e+02, 3.00566000e+07]])
These are the main attributes of the DataFrame, and we will continue with the methods of the DataFrame.