pandas.DataFrame Learning Series 1 - Definitions and Properties

Keywords: Python Attribute

Definition:

DataFrame is a two-dimensional, variable-size, mixed-component table data structure with labeled coordinate axes (rows and columns). Calculate based on row and column labels. It can be regarded as a dictionary-like container for serial objects, and is the main data structure in pandas.

Form:

class pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=False)

Parametric implications:

Data: numpy ndarray (multidimensional array) (structured or homogeneous), dict (dictionary), or DataFrame (data table) If it is a dictionary type, the dictionary can contain sequence, array, constant or list-type objects.
Index: Index or array-like index or array type. If there is no index information in the input data and no index is provided, the default assignment is arange(n), which is an array of equals starting from 0.
Columns: Index or array-like index or array type, an array of equal differences starting at 0 when column labels are not provided
Dtype: dtype, default None data type, empty by default. Only one data type is allowed, and if it is null, the type is automatically inferred.
Copy: boolean, default False Boolean type, default False. Copying data from input values only affects when the input is a DataFrame or a two-dimensional array

Other ways to build DataFrame types:

classmethod DataFrame.from_records(data, index=None, exclude=None, columns=None, coerce_float=False, nrows=None)[source]

classmethod DataFrame.from_dict(data, orient='columns', dtype=None)

pandas.read_csv, pandas.read_table, pandas.read_clipboard,pandas.read_excel etc.

For example:

From dictionary construction DataFrame
>>> d = {'col1': [1, 2], 'col2': [3, 4]} >>> df = pd.DataFrame(data=d) >>> df col1 col2 0 1 3 1 2 4
The inference type is int64
>>> df.dtypes col1 int64 col2 int64 dtype: object
Mandatory setting to a single type
>>> df = pd.DataFrame(data=d, dtype=np.int8)
>>> df.dtypes
col1    int8
col2    int8
dtype: object
from numpy Construction of Multidimensional Array Types DataFrame
>>> df2 = pd.DataFrame(np.random.randint(low=0, high=10, size=(5, 5)),
...                    columns=['a', 'b', 'c', 'd', 'e'])
>>> df2
    a   b   c   d   e
0   2   8   8   3   4
1   4   2   9   0   9
2   1   0   7   8   0
3   5   1   7   1   3
4   6   0   2   4   2

Properties:

Get and create the DataFrame

 1 import pandas as pd
 2 import numpy as np
 3  
 4 df=pd.read_excel('Bank of Nanjing.xlsx',index_col='Date')
 5 df1=df[:5]
 6 
 7 In [38]:df1.head()
 8 Out[38]: 
 9          Open  High   Low  Close  Trunover    Volume
10 Date                                                   
11 2017-09-15  8.06  8.08  8.03   8.04    195.43  24272800
12 2017-09-18  8.05  8.13  8.03   8.06    200.76  24867600
13 2017-09-19  8.03  8.06  7.94   8.00    433.76  54253100
14 2017-09-20  7.97  8.06  7.95   8.03    319.94  39909700
15 2017-09-21  8.02  8.10  7.99   8.04    241.94  30056600

- Transpose, after which the index attribute value of the DataFrame is None

1 In [39]:  df1.T
2 Out[39]: 
3 Date       2017-09-15   2017-09-18   2017-09-19   2017-09-20   2017-09-21
4 Open             8.06         8.05         8.03         7.97         8.02
5 High             8.08         8.13         8.06         8.06         8.10
6 Low              8.03         8.03         7.94         7.95         7.99
7 Close            8.04         8.06         8.00         8.03         8.04
8 Trunover       195.43       200.76       433.76       319.94       241.94
9 Volume    24272800.00  24867600.00  54253100.00  39909700.00  30056600.00

Fast tag-based access

In [35]:  date=pd.to_datetime('2017-09-15')
In [36]:  date
Out[36]: Timestamp('2017-09-15 00:00:00')
In [37]:  df1.at[date,'Open']
Out[37]: 8.0600000000000005

Get the label names for row and column axes

1 In [44]: df1.axes
2 Out[44]: 
3 [DatetimeIndex(['2017-09-15', '2017-09-18', '2017-09-19', '2017-09-20',
4                 '2017-09-21'], dtype='datetime64[ns]', name='Date', freq=None),
5  Index(['Open', 'High', 'Low', 'Close', 'Trunover', 'Volume'], dtype='object')]

-- Built-in properties

 1 In[45]: df1.blocks
 2 Out[45]: 
 3 {'float64':     Open  High   Low  Close  Trunover
 4  Date                                         
 5  2017-09-15  8.06  8.08  8.03   8.04    195.43
 6  2017-09-18  8.05  8.13  8.03   8.06    200.76
 7  2017-09-19  8.03  8.06  7.94   8.00    433.76
 8  2017-09-20  7.97  8.06  7.95   8.03    319.94
 9  2017-09-21  8.02  8.10  7.99   8.04    241.94,
10  'int64':               Volume
11  Date                
12  2017-09-15  24272800
13  2017-09-18  24867600
14  2017-09-19  54253100
15  2017-09-20  39909700
16  2017-09-21  30056600}

-- Column data types

1 In[46]:  df1.dtypes
2 Out[46]: 
3 Open        float64
4 High        float64
5 Low         float64
6 Close       float64
7 Trunover    float64
8 Volume        int64
9 dtype: object

Determine whether the DataFrame is completely empty

1 In [47]: df1.empty
2 Out[47]: False

- Returns sparse or dense tags and data types

1 In[48]:  df1.ftypes
2 Out[48]: 
3 Open        float64:dense
4 High        float64:dense
5 Low         float64:dense
6 Close       float64:dense
7 Trunover    float64:dense
8 Volume        int64:dense
9 dtype: object

Fast integer scalar positioning (to specific elements, equivalent to giving coordinates)

1 In[49]:  df1.iat[0,1] #Line 1, column 2
2 Out[49]: 8.0800000000000001
3 
4 In[50]:  df1.iat[1,0] #Line 2, column 1
5 Out[50]: 8.0500000000000007

Integer-based positioning index (slice) for location selection

 1 In [2]:  df1.iloc[0:1]
 2 Out[2]: 
 3             Open  High   Low  Close  Trunover    Volume
 4 Date                                                   
 5 2017-09-15  8.06  8.08  8.03   8.04    195.43  24272800
1 In [3]:  df1.iloc[0:1,2:]
2 Out[3]: 
3              Low  Close  Trunover    Volume
4 Date                                       
5 2017-09-15  8.03   8.04    195.43  24272800

- Hybrid positioning (based on integer positions or label names and their combinations, you can only use row labels, but not column labels)

1 In [6]:  df1.ix[1,'Open']
2 Out[6]: 8.0500000000000007
1 In [7]:  df1.ix[1]
2 Out[7]: 
3 Open               8.05
4 High               8.13
5 Low                8.03
6 Close              8.06
7 Trunover         200.76
8 Volume      24867600.00
9 Name: 2017-09-18 00:00:00, dtype: float64

- Location-based index based on label name

1 In[7]:  df1.loc[date,'Low']
2 Out[7]: 8.0299999999999994
3 
4 In [8]: df1.loc[df1.index[0],'Low']
5 Out[8]: 8.0299999999999994

- Number of coordinate axes

1 In [10]: df1.ndim
2 Out[10]: 2

--The shape of the DataFrame (number of rows and columns)

1 In [11]:  df1.shape
2 Out[11]: (5, 6)

--The size of the DataFrame (number of elements)

1 In [12]:  df1.size
2 Out[12]: 30

-- Returns the DataFrame style object

1 In [13]:  df1.style
2 Out[13]: <pandas.io.formats.style.Styler at 0x1c410cf8eb8>

-- Returns the values in the DataFrame (two-dimensional arrays)

 1 In [14]: df1.values
 2 Out[14]: 
 3 array([[  8.06000000e+00,   8.08000000e+00,   8.03000000e+00,
 4           8.04000000e+00,   1.95430000e+02,   2.42728000e+07],
 5        [  8.05000000e+00,   8.13000000e+00,   8.03000000e+00,
 6           8.06000000e+00,   2.00760000e+02,   2.48676000e+07],
 7        [  8.03000000e+00,   8.06000000e+00,   7.94000000e+00,
 8           8.00000000e+00,   4.33760000e+02,   5.42531000e+07],
 9        [  7.97000000e+00,   8.06000000e+00,   7.95000000e+00,
10           8.03000000e+00,   3.19940000e+02,   3.99097000e+07],
11        [  8.02000000e+00,   8.10000000e+00,   7.99000000e+00,
12           8.04000000e+00,   2.41940000e+02,   3.00566000e+07]])

These are the main attributes of the DataFrame, and we will continue with the methods of the DataFrame.

Posted by daverules on Sun, 19 May 2019 14:01:52 -0700