Basic operations of Index object Index and Index

Keywords: Python pandas

pandas notes 004

4, Basic operations of Index object Index and Index

import pandas as pd
import numpy as np

1. Index object index

1.1 Series and DataFrame

Indexes in Series and DataFrame are Index objects.

Series:

pd1 = pd.Series(range(5),index = ['A','B','C','D','E']) #Create a Series index from the list and specify the index name
print(pd1)
print("="*20)
print(type(pd1.index))   #Series is an index object
A    0
B    1
C    2
D    3
E    4
dtype: int64
====================
<class 'pandas.core.indexes.base.Index'>

DataFrame:

pd2 = pd.DataFrame(np.arange(9).reshape(3,3),index=['A','B','C'],columns=['M','N','Q'])  
#Create a DataFrame index from a two-dimensional array and specify the index row and column names
print(pd2)
print("="*20)
print(type(pd2.index))   #dataframe is an index object
   M  N  Q
A  0  1  2
B  3  4  5
C  6  7  8
====================
<class 'pandas.core.indexes.base.Index'>  
1.2 index object immutable
pd1.index[1] = 2  #report errors
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-1226982f94cb> in <module>
----> 1 pd1.index[1] = 2  #report errors

F:\Anaconda_all\Anaconda\lib\site-packages\pandas\core\indexes\base.py in __setitem__(self, key, value)
   4275     @final
   4276     def __setitem__(self, key, value):
-> 4277         raise TypeError("Index does not support mutable operations")
   4278 
   4279     def __getitem__(self, key):

TypeError: Index does not support mutable operations
1.3 common Index types
  • Index, index
  • Int64Index, integer index
  • MultiIndex, hierarchical index
  • DatetimeIndex, timestamp type

2. Some basic operations of index

  • reindex
  • increase
  • Delete
  • change
  • check
  • Advanced index
2.1 reindex
2.1.1 Series index
ps1 = pd.Series(range(5),index = ['A','B','C','D','E'])
ps1
A    0
B    1
C    2
D    3
E    4
dtype: int64
ps2 = ps1.reindex(['b','A','C','d','E','F']) #Rebuild row index
print(ps1)   #The original Series index has not changed
print("="*30)
print(ps2)   #If the new index is different from the original index, NAN will be returned, and if it is the same, the value corresponding to the original index will be returned, regardless of the index order
A    0
B    1
C    2
D    3
E    4
dtype: int64
==============================
b    NaN
A    0.0
C    2.0
d    NaN
E    4.0
F    NaN
dtype: float64
2.1.2 DataFrame index
ps3 = pd.DataFrame(np.arange(12).reshape(3,4),index=['A','B','C'],columns=['a','b','c','d'])
ps3
	a	b	c	d
A	0	1	2	3
B	4	5	6	7
C	8	9	10	11

Rebuild row index:

#Rebuild row index
ps4 = ps3.reindex(['e','B','A'])
print(ps3)    #The original DataFrame index has not changed
print("="*20)
print(ps4)
   a  b   c   d
A  0  1   2   3
B  4  5   6   7
C  8  9  10  11
====================
     a    b    c    d
e  NaN  NaN  NaN  NaN
B  4.0  5.0  6.0  7.0
A  0.0  1.0  2.0  3.0

Rebuild column index:

#Rebuild column index
ps5 = ps3.reindex(columns = ['b','c','q','v'])
print(ps3)     #The original DataFrame index has not changed
print("="*20)
print(ps5)
    b	c	q	v
A	1	2	NaN	NaN
B	5	6	NaN	NaN
C	9	10	NaN	NaN
2.2 add
2.2.1 Series index
p1 = pd.Series(range(5),index = ['A','B','C','D','E'])
p1
A    0
B    1
C    2
D    3
E    4
dtype: int64

Change original index:

#Change original index
p1['F'] = 9
p1 
A    0
B    1
C    2
D    3
E    4
F    9
dtype: int64

Do not change the original index:

#Create a new index object without changing the original index
s1 = pd.Series({'g':666})
p2 = p1.append(s1)
print(p1)     #Original index unchanged
print("="*20)
print(p2)
A    0
B    1
C    2
D    3
E    4
F    9
dtype: int64
====================
A      0
B      1
C      2
D      3
E      4
F      9
g    666
dtype: int64
2.2.2 DataFrame index

Add column

#DataFrame index
q = pd.DataFrame(np.arange(12).reshape(3,4),index=['A','B','C'],columns=['a','b','c','d'])
q
	a	b	c	d
A	0	1	2	3
B	4	5	6	7
C	8	9	10	11

By default, the column is changed, and a new column is added on the rightmost side of the column, affecting the original index

q['t'] = 9      #The new t columns are all 9
print(q)
print("="*20)
q['y'] = [10,12,14]  #Specifies the value of the new column
print(q)
print("="*20)
q['m'] = ['19','32','24']  #Specify the value of the new column in quotation marks
print(q)
   a  b   c   d  t
A  0  1   2   3  9
B  4  5   6   7  9
C  8  9  10  11  9
====================
   a  b   c   d  t   y
A  0  1   2   3  9  10
B  4  5   6   7  9  12
C  8  9  10  11  9  14
====================
   a  b   c   d  t   y   m
A  0  1   2   3  9  10  19
B  4  5   6   7  9  12  32
C  8  9  10  11  9  14  24

Add a new column to the specified location (insert)

#Adds a new column to the specified location
u = pd.DataFrame(np.arange(12).reshape(3,4),index=['A','B','C'],columns=['a','b','c','d'])
u
    a	b	c	d
A	0	1	2	3
B	4	5	6	7
C	8	9	10	11

insert will affect the original index

u.insert(0,'t',2) #Add column t before column 0 as column 0, and the values are all 2
print(u)   #
print("="*20)
u.insert(1,'r',[6,66,666])  #Add column t before column 1 as column 1
print(u)
print("="*20)
u.insert(2,'s',['7','77','777'])  #Add column t before column 2 as column 2
print(u)
   t  a  b   c   d
A  2  0  1   2   3
B  2  4  5   6   7
C  2  8  9  10  11
====================
   t    r  a  b   c   d
A  2    6  0  1   2   3
B  2   66  4  5   6   7
C  2  666  8  9  10  11
====================
   t    r    s  a  b   c   d
A  2    6    7  0  1   2   3
B  2   66   77  4  5   6   7
C  2  666  777  8  9  10  11

Add row

#Add row
qt = pd.DataFrame(np.arange(12).reshape(3,4),index=['A','B','C'],columns=['a','b','c','d'])
qt
	a	b	c	d
A	0	1	2	3
B	4	5	6	7
C	8	9	10	11

Use label index loc:

#Using the label index loc, the original index is changed
qt.loc['D'] = [1,11,111,1111]  #Add row D
qt
    a	b	c	d
A	0	1	2	3
B	4	5	6	7
C	8	9	10	11
D	1	11	111	1111

Use append

row = {'a':6,'b':6,'c':6,'d':6}
qt1 = qt.append(row,ignore_index=True)  #Add ignore_index=True statement, (ignore the original row index name), otherwise an error will be reported
print(qt)  #Original index unchanged
print("="*20)
print(qt1)
   a   b    c     d
A  0   1    2     3
B  4   5    6     7
C  8   9   10    11
D  1  11  111  1111
====================
   a   b    c     d
0  0   1    2     3
1  4   5    6     7
2  8   9   10    11
3  1  11  111  1111
4  6   6    6     6
2.3 delete
2.3.1 del

Will change the original index.

Series

k1 = pd.Series(range(5),index = ['A','B','C','D','E'])
k1
A    0
B    1
C    2
D    3
E    4
dtype: int64
del k1['A'] #Delete row
k1
B    1
C    2
D    3
E    4
dtype: int64

DataFrame

k2 = pd.DataFrame(np.arange(12).reshape(3,4),index=['A','B','C'],columns=['a','b','c','d'])
k2
    a	b	c	d
A	0	1	2	3
B	4	5	6	7
C	8	9	10	11
del k2['b']   #Delete column b
k2
    a	c	d
A	0	2	3
B	4	6	7
C	8	10	11
2.3.2 drop

Without changing the original index, it is deleted as a new index object.

Series

kt1 = pd.Series(range(4),index = ['A','B','C','D'])
kt1
A    0
B    1
C    2
D    3
dtype: int64

Delete a piece of data on the axis:

#Delete a piece of data on the axis
kt2 = kt1.drop('A')
print(kt1)  #The original index object has not changed
print("="*20)
print(kt2)
A    0
B    1
C    2
D    3
dtype: int64
====================
B    1
C    2
D    3
dtype: int64

Delete multiple pieces of data:

#Delete multiple pieces of data
kt3 = kt1.drop(['A','C'])
print(kt1)  #The original index object has not changed
print("="*20)
print(kt3)
A    0
B    1
C    2
D    3
dtype: int64
====================
B    1
D    3
dtype: int64

DataFrame

tj1 = pd.DataFrame(np.arange(16).reshape(4,4),index=['A','B','C','D'],columns=['m','n','o','p'])
tj1
    m	n	o	p
A	0	1	2	3
B	4	5	6	7
C	8	9	10	11
D	12	13	14	15

Delete rows by default (axis=0)

#Delete rows by default (axis=0)
tj2 = tj1.drop('B') #Delete a row
print(tj1)   #The original index object has not changed
print("="*20)
print(tj2)
print("="*20)
tj3 = tj1.drop(['A','C']) #Delete multiple rows
print(tj1)    #The original index object has not changed
print("="*20)
print(tj3)
    m   n   o   p
A   0   1   2   3
B   4   5   6   7
C   8   9  10  11
D  12  13  14  15
====================
    m   n   o   p
A   0   1   2   3
C   8   9  10  11
D  12  13  14  15
====================
    m   n   o   p
A   0   1   2   3
B   4   5   6   7
C   8   9  10  11
D  12  13  14  15
====================
    m   n   o   p
B   4   5   6   7
D  12  13  14  15

Delete columns (axis=1 or axis = 'columns')

#Delete column (axis=1 or axis='columns')
tj4 = tj1.drop('m',axis=1) #Delete a column
print(tj1)
print("="*20)
print(tj4)
print("="*20)
tj5 = tj1.drop(['m','o'],axis='columns') #Delete multiple columns
print(tj1)
print("="*20)
print(tj5)
    m   n   o   p
A   0   1   2   3
B   4   5   6   7
C   8   9  10  11
D  12  13  14  15
====================
    n   o   p
A   1   2   3
B   5   6   7
C   9  10  11
D  13  14  15
====================
    m   n   o   p
A   0   1   2   3
B   4   5   6   7
C   8   9  10  11
D  12  13  14  15
====================
    n   p
A   1   3
B   5   7
C   9  11
D  13  15

inplace attribute of drop()

Delete on the original object and no new object will be returned.

#The inplace property is deleted on the original object and will not return a new object
bt = pd.Series(range(4),index = ['A','B','C','D'])
bt
A    0
B    1
C    2
D    3
dtype: int64
bt.drop('A',inplace=True)
bt
B    1
C    2
D    3
dtype: int64
2.4 modification
2.4.1 Series index
bpr = pd.Series(range(4),index = ['A','B','C','D'])
bpr
A    0
B    1
C    2
D    3
dtype: int64

Label index

bpr['A'] = 666  #Label index
bpr
A    666
B      1
C      2
D      3
dtype: int64

Location index

bpr[1] = 777  #Location index
bpr
A    666
B    777
C      2
D      3
dtype: int64
2.4.2 DataFrame index
tu1 = pd.DataFrame(np.arange(16).reshape(4,4),index=['A','B','C','D'],columns=['m','n','o','p'])
tu1
    m	n	o	p
A	0	1	2	3
B	4	5	6	7
C	8	9	10	11
D	12	13	14	15

Default change column

Object ['column']

tu1['p'] = 4   #Change all columns p to 4
tu1
    m	n	o	p
A	0	1	2	4
B	4	5	6	4
C	8	9	10	4
D	12	13	14	4

Object ['column']

tu1['n'] = ['2','22','222','2222']
tu1
    m	n	o	p
A	0	2	2	4
B	4	22	6	4
C	8	222	10	4
D	12	2222	14	4

Objects. Columns

# Object. Column: the effect is the same as the above object ['column']
tu1.m = [1,2,3,4]
tu1
   m	n	  o	  p
A	1	2	  2	  4
B	2	22	  6	  4
C	3	222	  10  4
D	4	2222  14  4

Modify rows using label index loc

#Use label index loc
td1 = pd.DataFrame(np.arange(16).reshape(4,4),index=['A','B','C','D'],columns=['m','n','o','p'])
td1
    m	n	o	p
A	0	1	2	3
B	4	5	6	7
C	8	9	10	11
D	12	13	14	15

loc ['row name']

td1.loc['A'] = 666  #Modify row A, all values are 666
td1
m	n	o	p
A	666	666	666	666
B	4	5	6	7
C	8	9	10	11
D	12	13	14	15

Modify exact value

#Modify a value
td1.loc['B','p'] = 100  #Modify the value of row B and column p to 100
td1
    m	n	o	p
A	666	666	666	666
B	4	5	6	100
C	8	9	10	11
D	12	13	14	15
2.5 check
2.5.1 Series index
cc = pd.Series(range(4),index = ['A','B','C','D'])
cc
A    0
B    1
C    2
D    3
dtype: int64

Row index

cc['A']  #Label index
0
cc[0]   #Location index
0

Slice index

#Position slice index
cc[1:4]  #Take left instead of right
B    1
C    2
D    3
dtype: int64
#Label slice index
cc['B':'D']    #Both left and right
B    1
C    2
D    3
dtype: int64

Discontinuous index (two brackets)

cc[['A','B']] #Label discontinuous index
A    0
B    1
dtype: int64
cc[[0,1]]   #Position discontinuous index
A    0
B    1
dtype: int64

Boolean index

#True is returned if the condition is met, otherwise False is returned
cc > 2
A    False
B    False
C    False
D     True
dtype: bool

Returns the value corresponding to the index that meets the condition (True)

cc[cc>2]   #Returns the value corresponding to the index that meets the condition (True)
D    3
dtype: int64
2.5.2 DataFrame index
red = pd.DataFrame(np.arange(16).reshape(4,4),index=['A','B','C','D'],columns=['m','n','o','p'])
red
	m	n	o	p
A	0	1	2	3
B	4	5	6	7
C	8	9	10	11
D	12	13	14	15

Column index

Note: 1. By default, only the column index can be retrieved, and an error is reported when the row index is retrieved. 2. The value can only be obtained by index name, not by location index (such as red[0])

#1. Column index (by default, only the column index can be retrieved, and an error is reported when the row index is retrieved)
red['n']  #It can only be obtained by index name, not by location index
A     1
B     5
C     9
D    13
Name: n, dtype: int32

Take multiple columns (discontinuous)

#Take multiple columns (discontinuous)
red[['m','p']]
	m	p
A	0	3
B	4	7
C	8	11
D	12	15

Take a value

#Take a value
red['m']['B']  #The first bracket represents a column and the second bracket represents a row
4

section

#section
red[1:3]  #The row is obtained, and the loc advanced index is required to obtain the column
    m	n	o	p
B	4	5	6	7
C	8	9	10	11
2.6 advanced index
  • loc Tag Index
  • iloc location index
  • ix tag and location hybrid index
2.6.1 loc Tag Index

Based on custom index name (label index)

Series

ts = pd.Series(range(4),index = ['A','B','C','D'])
ts
A    0
B    1
C    2
D    3
dtype: int64
ts.loc['A':'C']   #The common label slices of loc and ts['A':'C'] in Series are the same (both left and right label slices are taken)
A    0
B    1
C    2
dtype: int64

DataFrame

green = pd.DataFrame(np.arange(16).reshape(4,4),index=['A','B','C','D'],columns=['m','n','o','p'])
green
    m	n	o	p
A	0	1	2	3
B	4	5	6	7
C	8	9	10	11
D	12	13	14	15
green.loc['A','m']  #First row first column
0
green.loc['A':'C','m':'n']  #The first parameter is the range of rows (which can be a single row), and the second parameter is the range of columns (which can be a single column)
    m	n
A	0	1
B	4	5
C	8	9
2.6.2 iloc location index

The function is the same as loc, but the index is based on the index number

Series

lol = pd.Series(range(4),index = ['A','B','C','D'])
lol
A    0
B    1
C    2
D    3
dtype: int64
lol.iloc[1]
1
lol.iloc[1:3] #Take left instead of right
B    1
C    2
dtype: int64

DataFrame

gto = pd.DataFrame(np.arange(16).reshape(4,4),index=['A','B','C','D'],columns=['m','n','o','p'])
gto
	m	n	o	p
A	0	1	2	3
B	4	5	6	7
C	8	9	10	11
D	12	13	14	1
gto.iloc[0,1]  #The first parameter is row and the second parameter is column. Here, it means to take the value of the first row and the second column
1

Position slice left not right

gto.iloc[1:3,0:3]  #The first parameter is row, and the second parameter is column (the position slice is left rather than right)
	m	n	o
B	4	5	6
C	8	9	10

Posted by kaze on Tue, 26 Oct 2021 06:47:50 -0700