python pandas dataframe row and column selection, slicing operation
In SQL, select is based on the column name; panda is more flexible. It can be selected not only based on the column name, but also based on the position of the column (number, in the row and column, note that the position of the panda column starts from 0). The correlation functions are as follows:
1) loc, based on the column label, select a specific row (based on the row index)
2) iloc, position based on row / column
3) at, quickly locate the elements of DataFrame according to the specified row index and column label
4) iat, similar to at, is located according to position
5) ix, a mixture of loc and iloc, supports both label and position;
Example
import pandas as pd import numpy as np df = pd.DataFrame({'total_bill': [16.99, 10.34, 23.68, 23.68, 24.59], 'tip': [1.01, 1.66, 3.50, 3.31, 3.61], 'sex': ['Female', 'Male', 'Male', 'Male', 'Female']}) # data type of columns print df.dtypes # indexes print df.index # return pandas.Index print df.columns # each row, return array[array] print df.values print df
sex object tip float64 total_bill float64 dtype: object RangeIndex(start=0, stop=5, step=1) Index([u'sex', u'tip', u'total_bill'], dtype='object') [['Female' 1.01 16.99] ['Male' 1.66 10.34] ['Male' 3.5 23.68] ['Male' 3.31 23.68] ['Female' 3.61 24.59]] sex tip total_bill 0 Female 1.01 16.99 1 Male 1.66 10.34 2 Male 3.50 23.68 3 Male 3.31 23.68 4 Female 3.61 24.59
print df.loc[1:3, ['total_bill', 'tip']] print df.loc[1:3, 'tip': 'total_bill'] print df.iloc[1:3, [1, 2]] print df.iloc[1:3, 1: 3]
total_bill tip 1 10.34 1.66 2 23.68 3.50 3 23.68 3.31 tip total_bill 1 1.66 10.34 2 3.50 23.68 3 3.31 23.68 tip total_bill 1 1.66 10.34 2 3.50 23.68 tip total_bill 1 1.66 10.34 2 3.50 23.68
Wrong representation:
print df.loc[1:3, [2, 3]]#. loc only supports column name operations
KeyError: 'None of [[2, 3]] are in the [columns]'
print df.loc[[2, 3]]#. loc can be row selection without column name
sex tip total_bill 2 Male 3.50 23.68 3 Male 3.31 23.68
print df.iloc[1:3]#. iloc can be row selection without adding the first column
sex tip total_bill 1 Male 1.66 10.34 2 Male 3.50 23.68
print df.iloc[1:3, 'tip': 'total_bill']
TypeError: cannot do slice indexing on <class 'pandas.indexes.base.Index'> with these indexers [tip] of <type 'str'>
print df.at[3, 'tip'] print df.iat[3, 1] print df.ix[1:3, [1, 2]] print df.ix[1:3, ['total_bill', 'tip']]
3.31 3.31 tip total_bill 1 1.66 10.34 2 3.50 23.68 3 3.31 23.68 total_bill tip 1 10.34 1.66 2 23.68 3.50 3 23.68 3.31
print df.ix[[1, 2]]#Row selection
sex tip total_bill 1 Male 1.66 10.34 2 Male 3.50 23.68
print df[1: 3] print df[['total_bill', 'tip']] # print df[1:2, ['total_bill', 'tip']] # TypeError: unhashable type
sex tip total_bill 1 Male 1.66 10.34 2 Male 3.50 23.68 total_bill tip 0 16.99 1.01 1 10.34 1.66 2 23.68 3.50 3 23.68 3.31 4 24.59 3.61
print df[1:3,1:2]
TypeError: unhashable type
summary
1) . loc,.iloc,.ix, only add the first parameter such as. loc([1,2]),.iloc([2:3]),.ix[2] Line selection
2) . loc,.at, column selection can only be column name, not position
3) . iloc,.iat, column selection can only be position, not column name
4) df [] can only select row or column, not column at the same time. Column selection can only be column name.