Basic operations of Index object Index and Index

pandas notes 004

4, Basic operations of Index object Index and Index

import pandas as pd
import numpy as np

1. Index object index

1.1 Series and DataFrame

Indexes in Series and DataFrame are Index objects.

Series:

pd1 = pd.Series(range(5),index = ['A','B','C','D','E']) #Create a Series index from the list and specify the index name
print(pd1)
print("="*20)
print(type(pd1.index))   #Series is an index object

A    0
B    1
C    2
D    3
E    4
dtype: int64
====================
<class 'pandas.core.indexes.base.Index'>

DataFrame:

pd2 = pd.DataFrame(np.arange(9).reshape(3,3),index=['A','B','C'],columns=['M','N','Q'])  
#Create a DataFrame index from a two-dimensional array and specify the index row and column names
print(pd2)
print("="*20)
print(type(pd2.index))   #dataframe is an index object

   M  N  Q
A  0  1  2
B  3  4  5
C  6  7  8
====================
<class 'pandas.core.indexes.base.Index'>

1.2 index object immutable

pd1.index[1] = 2  #report errors

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-1226982f94cb> in <module>
----> 1 pd1.index[1] = 2  #report errors

F:\Anaconda_all\Anaconda\lib\site-packages\pandas\core\indexes\base.py in __setitem__(self, key, value)
   4275     @final
   4276     def __setitem__(self, key, value):
-> 4277         raise TypeError("Index does not support mutable operations")
   4278 
   4279     def __getitem__(self, key):

TypeError: Index does not support mutable operations

1.3 common Index types

Index, index
Int64Index, integer index
MultiIndex, hierarchical index
DatetimeIndex, timestamp type

2. Some basic operations of index

reindex
increase
Delete
change
check
Advanced index

2.1 reindex

2.1.1 Series index

ps1 = pd.Series(range(5),index = ['A','B','C','D','E'])
ps1

A    0
B    1
C    2
D    3
E    4
dtype: int64

ps2 = ps1.reindex(['b','A','C','d','E','F']) #Rebuild row index
print(ps1)   #The original Series index has not changed
print("="*30)
print(ps2)   #If the new index is different from the original index, NAN will be returned, and if it is the same, the value corresponding to the original index will be returned, regardless of the index order

A    0
B    1
C    2
D    3
E    4
dtype: int64
==============================
b    NaN
A    0.0
C    2.0
d    NaN
E    4.0
F    NaN
dtype: float64

2.1.2 DataFrame index

ps3 = pd.DataFrame(np.arange(12).reshape(3,4),index=['A','B','C'],columns=['a','b','c','d'])
ps3

	a	b	c	d
A	0	1	2	3
B	4	5	6	7
C	8	9	10	11

Rebuild row index:

#Rebuild row index
ps4 = ps3.reindex(['e','B','A'])
print(ps3)    #The original DataFrame index has not changed
print("="*20)
print(ps4)

   a  b   c   d
A  0  1   2   3
B  4  5   6   7
C  8  9  10  11
====================
     a    b    c    d
e  NaN  NaN  NaN  NaN
B  4.0  5.0  6.0  7.0
A  0.0  1.0  2.0  3.0

Rebuild column index:

#Rebuild column index
ps5 = ps3.reindex(columns = ['b','c','q','v'])
print(ps3)     #The original DataFrame index has not changed
print("="*20)
print(ps5)

    b	c	q	v
A	1	2	NaN	NaN
B	5	6	NaN	NaN
C	9	10	NaN	NaN

2.2 add

2.2.1 Series index

p1 = pd.Series(range(5),index = ['A','B','C','D','E'])
p1

A    0
B    1
C    2
D    3
E    4
dtype: int64

Change original index:

#Change original index
p1['F'] = 9
p1

A    0
B    1
C    2
D    3
E    4
F    9
dtype: int64

Do not change the original index:

#Create a new index object without changing the original index
s1 = pd.Series({'g':666})
p2 = p1.append(s1)
print(p1)     #Original index unchanged
print("="*20)
print(p2)

A    0
B    1
C    2
D    3
E    4
F    9
dtype: int64
====================
A      0
B      1
C      2
D      3
E      4
F      9
g    666
dtype: int64

2.2.2 DataFrame index

Add column

#DataFrame index
q = pd.DataFrame(np.arange(12).reshape(3,4),index=['A','B','C'],columns=['a','b','c','d'])
q

	a	b	c	d
A	0	1	2	3
B	4	5	6	7
C	8	9	10	11

By default, the column is changed, and a new column is added on the rightmost side of the column, affecting the original index

q['t'] = 9      #The new t columns are all 9
print(q)
print("="*20)
q['y'] = [10,12,14]  #Specifies the value of the new column
print(q)
print("="*20)
q['m'] = ['19','32','24']  #Specify the value of the new column in quotation marks
print(q)

   a  b   c   d  t
A  0  1   2   3  9
B  4  5   6   7  9
C  8  9  10  11  9
====================
   a  b   c   d  t   y
A  0  1   2   3  9  10
B  4  5   6   7  9  12
C  8  9  10  11  9  14
====================
   a  b   c   d  t   y   m
A  0  1   2   3  9  10  19
B  4  5   6   7  9  12  32
C  8  9  10  11  9  14  24

Add a new column to the specified location (insert)

#Adds a new column to the specified location
u = pd.DataFrame(np.arange(12).reshape(3,4),index=['A','B','C'],columns=['a','b','c','d'])
u

    a	b	c	d
A	0	1	2	3
B	4	5	6	7
C	8	9	10	11

insert will affect the original index

u.insert(0,'t',2) #Add column t before column 0 as column 0, and the values are all 2
print(u)   #
print("="*20)
u.insert(1,'r',[6,66,666])  #Add column t before column 1 as column 1
print(u)
print("="*20)
u.insert(2,'s',['7','77','777'])  #Add column t before column 2 as column 2
print(u)

   t  a  b   c   d
A  2  0  1   2   3
B  2  4  5   6   7
C  2  8  9  10  11
====================
   t    r  a  b   c   d
A  2    6  0  1   2   3
B  2   66  4  5   6   7
C  2  666  8  9  10  11
====================
   t    r    s  a  b   c   d
A  2    6    7  0  1   2   3
B  2   66   77  4  5   6   7
C  2  666  777  8  9  10  11

Add row

#Add row
qt = pd.DataFrame(np.arange(12).reshape(3,4),index=['A','B','C'],columns=['a','b','c','d'])
qt

	a	b	c	d
A	0	1	2	3
B	4	5	6	7
C	8	9	10	11

Use label index loc:

#Using the label index loc, the original index is changed
qt.loc['D'] = [1,11,111,1111]  #Add row D
qt

    a	b	c	d
A	0	1	2	3
B	4	5	6	7
C	8	9	10	11
D	1	11	111	1111

Use append

row = {'a':6,'b':6,'c':6,'d':6}
qt1 = qt.append(row,ignore_index=True)  #Add ignore_index=True statement, (ignore the original row index name), otherwise an error will be reported
print(qt)  #Original index unchanged
print("="*20)
print(qt1)

   a   b    c     d
A  0   1    2     3
B  4   5    6     7
C  8   9   10    11
D  1  11  111  1111
====================
   a   b    c     d
0  0   1    2     3
1  4   5    6     7
2  8   9   10    11
3  1  11  111  1111
4  6   6    6     6

2.3 delete

2.3.1 del

Will change the original index.

Series

k1 = pd.Series(range(5),index = ['A','B','C','D','E'])
k1

A    0
B    1
C    2
D    3
E    4
dtype: int64

del k1['A'] #Delete row
k1

B    1
C    2
D    3
E    4
dtype: int64

DataFrame

k2 = pd.DataFrame(np.arange(12).reshape(3,4),index=['A','B','C'],columns=['a','b','c','d'])
k2

    a	b	c	d
A	0	1	2	3
B	4	5	6	7
C	8	9	10	11

del k2['b']   #Delete column b
k2

2.3.2 drop

Without changing the original index, it is deleted as a new index object.

Series

kt1 = pd.Series(range(4),index = ['A','B','C','D'])
kt1

A    0
B    1
C    2
D    3
dtype: int64

Delete a piece of data on the axis:

#Delete a piece of data on the axis
kt2 = kt1.drop('A')
print(kt1)  #The original index object has not changed
print("="*20)
print(kt2)

A    0
B    1
C    2
D    3
dtype: int64
====================
B    1
C    2
D    3
dtype: int64

Delete multiple pieces of data:

#Delete multiple pieces of data
kt3 = kt1.drop(['A','C'])
print(kt1)  #The original index object has not changed
print("="*20)
print(kt3)

A    0
B    1
C    2
D    3
dtype: int64
====================
B    1
D    3
dtype: int64

DataFrame

tj1 = pd.DataFrame(np.arange(16).reshape(4,4),index=['A','B','C','D'],columns=['m','n','o','p'])
tj1

    m	n	o	p
A	0	1	2	3
B	4	5	6	7
C	8	9	10	11
D	12	13	14	15

Delete rows by default (axis=0)

#Delete rows by default (axis=0)
tj2 = tj1.drop('B') #Delete a row
print(tj1)   #The original index object has not changed
print("="*20)
print(tj2)
print("="*20)
tj3 = tj1.drop(['A','C']) #Delete multiple rows
print(tj1)    #The original index object has not changed
print("="*20)
print(tj3)

    m   n   o   p
A   0   1   2   3
B   4   5   6   7
C   8   9  10  11
D  12  13  14  15
====================
    m   n   o   p
A   0   1   2   3
C   8   9  10  11
D  12  13  14  15
====================
    m   n   o   p
A   0   1   2   3
B   4   5   6   7
C   8   9  10  11
D  12  13  14  15
====================
    m   n   o   p
B   4   5   6   7
D  12  13  14  15

Delete columns (axis=1 or axis = 'columns')

#Delete column (axis=1 or axis='columns')
tj4 = tj1.drop('m',axis=1) #Delete a column
print(tj1)
print("="*20)
print(tj4)
print("="*20)
tj5 = tj1.drop(['m','o'],axis='columns') #Delete multiple columns
print(tj1)
print("="*20)
print(tj5)

    m   n   o   p
A   0   1   2   3
B   4   5   6   7
C   8   9  10  11
D  12  13  14  15
====================
    n   o   p
A   1   2   3
B   5   6   7
C   9  10  11
D  13  14  15
====================
    m   n   o   p
A   0   1   2   3
B   4   5   6   7
C   8   9  10  11
D  12  13  14  15
====================
    n   p
A   1   3
B   5   7
C   9  11
D  13  15

inplace attribute of drop()

Delete on the original object and no new object will be returned.

#The inplace property is deleted on the original object and will not return a new object
bt = pd.Series(range(4),index = ['A','B','C','D'])
bt

A    0
B    1
C    2
D    3
dtype: int64

bt.drop('A',inplace=True)
bt

B    1
C    2
D    3
dtype: int64

2.4 modification

2.4.1 Series index

bpr = pd.Series(range(4),index = ['A','B','C','D'])
bpr

A    0
B    1
C    2
D    3
dtype: int64

Label index

bpr['A'] = 666  #Label index
bpr

A    666
B      1
C      2
D      3
dtype: int64

Location index

bpr[1] = 777  #Location index
bpr

A    666
B    777
C      2
D      3
dtype: int64

2.4.2 DataFrame index

tu1 = pd.DataFrame(np.arange(16).reshape(4,4),index=['A','B','C','D'],columns=['m','n','o','p'])
tu1

    m	n	o	p
A	0	1	2	3
B	4	5	6	7
C	8	9	10	11
D	12	13	14	15

Default change column

Object ['column']

tu1['p'] = 4   #Change all columns p to 4
tu1

    m	n	o	p
A	0	1	2	4
B	4	5	6	4
C	8	9	10	4
D	12	13	14	4

Object ['column']

tu1['n'] = ['2','22','222','2222']
tu1

    m	n	o	p
A	0	2	2	4
B	4	22	6	4
C	8	222	10	4
D	12	2222	14	4

Objects. Columns

# Object. Column: the effect is the same as the above object ['column']
tu1.m = [1,2,3,4]
tu1

   m	n	  o	  p
A	1	2	  2	  4
B	2	22	  6	  4
C	3	222	  10  4
D	4	2222  14  4

Modify rows using label index loc

#Use label index loc
td1 = pd.DataFrame(np.arange(16).reshape(4,4),index=['A','B','C','D'],columns=['m','n','o','p'])
td1

    m	n	o	p
A	0	1	2	3
B	4	5	6	7
C	8	9	10	11
D	12	13	14	15

loc ['row name']

td1.loc['A'] = 666  #Modify row A, all values are 666
td1

m	n	o	p
A	666	666	666	666
B	4	5	6	7
C	8	9	10	11
D	12	13	14	15

Modify exact value

#Modify a value
td1.loc['B','p'] = 100  #Modify the value of row B and column p to 100
td1

    m	n	o	p
A	666	666	666	666
B	4	5	6	100
C	8	9	10	11
D	12	13	14	15

2.5 check

2.5.1 Series index

cc = pd.Series(range(4),index = ['A','B','C','D'])
cc

A    0
B    1
C    2
D    3
dtype: int64

Row index

cc['A']  #Label index

cc[0]   #Location index

Slice index

#Position slice index
cc[1:4]  #Take left instead of right

B    1
C    2
D    3
dtype: int64

#Label slice index
cc['B':'D']    #Both left and right

B    1
C    2
D    3
dtype: int64

Discontinuous index (two brackets)

cc[['A','B']] #Label discontinuous index

A    0
B    1
dtype: int64

cc[[0,1]]   #Position discontinuous index

A    0
B    1
dtype: int64

Boolean index

#True is returned if the condition is met, otherwise False is returned
cc > 2

A    False
B    False
C    False
D     True
dtype: bool

Returns the value corresponding to the index that meets the condition (True)

cc[cc>2]   #Returns the value corresponding to the index that meets the condition (True)

D    3
dtype: int64

2.5.2 DataFrame index

red = pd.DataFrame(np.arange(16).reshape(4,4),index=['A','B','C','D'],columns=['m','n','o','p'])
red

	m	n	o	p
A	0	1	2	3
B	4	5	6	7
C	8	9	10	11
D	12	13	14	15

Column index

Note: 1. By default, only the column index can be retrieved, and an error is reported when the row index is retrieved. 2. The value can only be obtained by index name, not by location index (such as red[0])

#1. Column index (by default, only the column index can be retrieved, and an error is reported when the row index is retrieved)
red['n']  #It can only be obtained by index name, not by location index

A     1
B     5
C     9
D    13
Name: n, dtype: int32

Take multiple columns (discontinuous)

#Take multiple columns (discontinuous)
red[['m','p']]

Take a value

#Take a value
red['m']['B']  #The first bracket represents a column and the second bracket represents a row

section

#section
red[1:3]  #The row is obtained, and the loc advanced index is required to obtain the column

    m	n	o	p
B	4	5	6	7
C	8	9	10	11

2.6 advanced index

loc Tag Index
iloc location index
ix tag and location hybrid index

2.6.1 loc Tag Index

Based on custom index name (label index)

Series

ts = pd.Series(range(4),index = ['A','B','C','D'])
ts

A    0
B    1
C    2
D    3
dtype: int64

ts.loc['A':'C']   #The common label slices of loc and ts['A':'C'] in Series are the same (both left and right label slices are taken)

A    0
B    1
C    2
dtype: int64

DataFrame

green = pd.DataFrame(np.arange(16).reshape(4,4),index=['A','B','C','D'],columns=['m','n','o','p'])
green

    m	n	o	p
A	0	1	2	3
B	4	5	6	7
C	8	9	10	11
D	12	13	14	15

green.loc['A','m']  #First row first column

green.loc['A':'C','m':'n']  #The first parameter is the range of rows (which can be a single row), and the second parameter is the range of columns (which can be a single column)

2.6.2 iloc location index

The function is the same as loc, but the index is based on the index number

Series

lol = pd.Series(range(4),index = ['A','B','C','D'])
lol

A    0
B    1
C    2
D    3
dtype: int64

lol.iloc[1]

lol.iloc[1:3] #Take left instead of right

B    1
C    2
dtype: int64

DataFrame

gto = pd.DataFrame(np.arange(16).reshape(4,4),index=['A','B','C','D'],columns=['m','n','o','p'])
gto

	m	n	o	p
A	0	1	2	3
B	4	5	6	7
C	8	9	10	11
D	12	13	14	1

gto.iloc[0,1]  #The first parameter is row and the second parameter is column. Here, it means to take the value of the first row and the second column

Position slice left not right

gto.iloc[1:3,0:3]  #The first parameter is row, and the second parameter is column (the position slice is left rather than right)

	m	n	o
B	4	5	6
C	8	9	10

Posted by kaze on Tue, 26 Oct 2021 06:47:50 -0700

Programmer Group

Basic operations of Index object Index and Index

pandas notes 004

4, Basic operations of Index object Index and Index

1. Index object index

1.1 Series and DataFrame

1.2 index object immutable

1.3 common Index types

2. Some basic operations of index

2.1 reindex

2.1.1 Series index

2.1.2 DataFrame index

2.2 add

2.2.1 Series index

2.2.2 DataFrame index

2.3 delete

2.3.1 del

2.3.2 drop

2.4 modification

2.4.1 Series index

2.4.2 DataFrame index

2.5 check

2.5.1 Series index

2.5.2 DataFrame index

2.6 advanced index

2.6.1 loc Tag Index

2.6.2 iloc location index

Hot Keywords