python pandas dataframe row and column selection, slicing operation

Keywords: Python SQL

python pandas dataframe row and column selection, slicing operation

In SQL, select is based on the column name; panda is more flexible. It can be selected not only based on the column name, but also based on the position of the column (number, in the row and column, note that the position of the panda column starts from 0). The correlation functions are as follows:
1) loc, based on the column label, select a specific row (based on the row index)
2) iloc, position based on row / column
3) at, quickly locate the elements of DataFrame according to the specified row index and column label
4) iat, similar to at, is located according to position
5) ix, a mixture of loc and iloc, supports both label and position;

Example

import pandas as pd
import numpy as np


df = pd.DataFrame({'total_bill': [16.99, 10.34, 23.68, 23.68, 24.59],
                   'tip': [1.01, 1.66, 3.50, 3.31, 3.61],
                   'sex': ['Female', 'Male', 'Male', 'Male', 'Female']})
# data type of columns
print df.dtypes
# indexes
print df.index
# return pandas.Index
print df.columns
# each row, return array[array]
print df.values
print df
sex            object
tip           float64
total_bill    float64
dtype: object
RangeIndex(start=0, stop=5, step=1)
Index([u'sex', u'tip', u'total_bill'], dtype='object')
[['Female' 1.01 16.99]
 ['Male' 1.66 10.34]
 ['Male' 3.5 23.68]
 ['Male' 3.31 23.68]
 ['Female' 3.61 24.59]]
      sex   tip  total_bill
0  Female  1.01       16.99
1    Male  1.66       10.34
2    Male  3.50       23.68
3    Male  3.31       23.68
4  Female  3.61       24.59
print df.loc[1:3, ['total_bill', 'tip']]
print df.loc[1:3, 'tip': 'total_bill']
print df.iloc[1:3, [1, 2]]
print df.iloc[1:3, 1: 3]
   total_bill   tip
1       10.34  1.66
2       23.68  3.50
3       23.68  3.31
    tip  total_bill
1  1.66       10.34
2  3.50       23.68
3  3.31       23.68
    tip  total_bill
1  1.66       10.34
2  3.50       23.68
    tip  total_bill
1  1.66       10.34
2  3.50       23.68

Wrong representation:

print df.loc[1:3, [2, 3]]#. loc only supports column name operations
KeyError: 'None of [[2, 3]] are in the [columns]'
print df.loc[[2, 3]]#. loc can be row selection without column name
    sex   tip  total_bill
2  Male  3.50       23.68
3  Male  3.31       23.68
print df.iloc[1:3]#. iloc can be row selection without adding the first column
    sex   tip  total_bill
1  Male  1.66       10.34
2  Male  3.50       23.68
print df.iloc[1:3, 'tip': 'total_bill']
TypeError: cannot do slice indexing on <class 'pandas.indexes.base.Index'> with these indexers [tip] of <type 'str'>
print df.at[3, 'tip']
print df.iat[3, 1]
print df.ix[1:3, [1, 2]]
print df.ix[1:3, ['total_bill', 'tip']]
3.31
3.31
    tip  total_bill
1  1.66       10.34
2  3.50       23.68
3  3.31       23.68
   total_bill   tip
1       10.34  1.66
2       23.68  3.50
3       23.68  3.31
print df.ix[[1, 2]]#Row selection
    sex   tip  total_bill
1  Male  1.66       10.34
2  Male  3.50       23.68
print df[1: 3]
print df[['total_bill', 'tip']]
# print df[1:2, ['total_bill', 'tip']]  # TypeError: unhashable type
    sex   tip  total_bill
1  Male  1.66       10.34
2  Male  3.50       23.68
   total_bill   tip
0       16.99  1.01
1       10.34  1.66
2       23.68  3.50
3       23.68  3.31
4       24.59  3.61
print df[1:3,1:2]
  •  
TypeError: unhashable type

summary

1) . loc,.iloc,.ix, only add the first parameter such as. loc([1,2]),.iloc([2:3]),.ix[2] Line selection
2) . loc,.at, column selection can only be column name, not position
3) . iloc,.iat, column selection can only be position, not column name
4) df [] can only select row or column, not column at the same time. Column selection can only be column name.

Posted by mikeylikesyou on Mon, 30 Dec 2019 20:28:11 -0800