pandas notes 007
7, Hierarchical index
import pandas as pd import numpy as np
1. Hierarchical indexing
1.1 Series
When creating a Series, use Index to specify the internal and external indexes. The first inner list is the outer Index, and the second inner list is the inner Index.
data = pd.Series(np.random.randn(9), index=[['a', 'a', 'a', 'b', 'b', 'c', 'c', 'd', 'd'], #Outer index [1, 2, 3, 1, 3, 1, 2, 2, 3] #Inner index ]) data
a 1 -1.745267 2 0.749512 3 0.891167 b 1 0.894595 3 -0.978024 c 1 -0.365535 2 -0.445463 d 2 -0.041103 3 0.460878 dtype: float64
View hierarchy index types and index objects
print(type(data.index)) #The hierarchical index type is MultiIndex print("="*50) print(data.index) #View the index object. The first element in parentheses is the outer index, and the second element is the inner index
<class 'pandas.core.indexes.multi.MultiIndex'> ================================================== MultiIndex([('a', 1), ('a', 2), ('a', 3), ('b', 1), ('b', 3), ('c', 1), ('c', 2), ('d', 2), ('d', 3)], )
The data is obtained according to the outer index, and all outer indexes are a values (including inner indexes).
data['a']
1 -1.745267 2 0.749512 3 0.891167 dtype: float64
The outer index slice value obtains the value of the outer index ranging from b to C (including b and c) (including the inner index)
data['b':'c']
b 1 0.894595 3 -0.978024 c 1 -0.365535 2 -0.445463 dtype: float64
The internal and external indexes take values together, separated by commas.
data['b',3] # Get the value with outer index b and inner index 3
-0.9780239352546295
Use the advanced index loc (label index) value
data.loc[['b','c']]
b 1 0.894595 3 -0.978024 c 1 -0.365535 2 -0.445463 dtype: float64
Get all the data with the outer index and the inner index of 2
data.loc[:,2]
a 0.749512 c -0.445463 d -0.041103 dtype: float64
1.2 DataFrame
Create DataFrame
frame = pd.DataFrame({'a': range(7), 'b': range(7, 0, -1), 'c': ['one', 'one', 'one', 'two', 'two','two', 'two'], 'd': [0, 1, 2, 0, 1, 2, 3]}) frame
a b c d 0 0 7 one 0 1 1 6 one 1 2 2 5 one 2 3 3 4 two 0 4 4 3 two 1 5 5 2 two 2 6 6 1 two 3
Create a hierarchical index:
Make two columns as internal and external indexes and delete them
frame2 = frame.set_index(['c','d']) #Take columns C and D as internal and external indexes and delete them frame2
a b c d one 0 0 7 1 1 6 2 2 5 two 0 3 4 1 4 3 2 5 2 3 6 1
Converting a hierarchical index to a generic DataFrame index
frame2.reset_index()
c d a b 0 one 0 0 7 1 one 1 1 6 2 one 2 2 5 3 two 0 3 4 4 two 1 4 3 5 two 2 5 2 6 two 3 6 1
Take columns C and D as internal and external indexes and keep these two columns
frame.set_index(['c','d'],drop=False)
a b c d c d one 0 0 7 one 0 1 1 6 one 1 2 2 5 one 2 two 0 3 4 two 0 1 4 3 two 1 2 5 2 two 2 3 6 1 two 3
2. Exchange
2.1 exchange layering sequence
a1 = pd.Series(np.random.randn(12),index=[ ['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c', 'd', 'd', 'd'], #Outer index [0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2] #Inner index ]) a1
a 0 -0.864526 1 -1.198998 2 -0.208866 b 0 1.130544 1 0.958730 2 0.548593 c 0 0.283399 1 0.289012 2 -1.575808 d 0 -0.094977 1 0.639439 2 -0.692990 dtype: float64
Swap level() exchanges inner and outer indexes
a1.swaplevel() #Exchange inner and outer indexes
0 a -0.864526 1 a -1.198998 2 a -0.208866 0 b 1.130544 1 b 0.958730 2 b 0.548593 0 c 0.283399 1 c 0.289012 2 c -1.575808 0 d -0.094977 1 d 0.639439 2 d -0.692990 dtype: float64
2.2 sorting and layering
-
sortlevel() sorts the outer index first, and then the inner index. The default is ascending. But sortlevel () is no longer supported. Sort_ The function of index () is the same as sortlevel ().
-
sort_index()
Default level = 0 (sort the outer index first, and then sort the inner index if the outer index is the same),
Level = 1 (sort the inner index first. If the inner indexes are the same, then sort the outer index).
a2 = pd.Series(np.random.randn(12),index=[ [ 'b', 'b', 'b','a', 'a','a', 'd', 'd', 'd', 'c', 'c', 'c'], #Outer index [4, 1, 2, 7, 3, 2, 1, 3, 2,7, 3, 9,] #Inner index ]) a2
b 4 0.784921 1 0.560649 2 0.047067 a 7 0.576054 3 0.808312 2 -0.570005 d 1 0.237384 3 -1.013164 2 0.497827 c 7 0.960867 3 -0.362849 9 0.084907 dtype: float64
Default level = 0 (sort the outer index first, and then sort the inner index if the outer index is the same)
a2.sort_index()
a 2 -0.570005 3 0.808312 7 0.576054 b 1 0.560649 2 0.047067 4 0.784921 c 3 -0.362849 7 0.960867 9 0.084907 d 1 0.237384 2 0.497827 3 -1.013164 dtype: float64
Specify level = 1 (sort the inner index first, and then sort the outer index if the inner indexes are the same)
a2.sort_index(level=1)
b 1 0.560649 d 1 0.237384 a 2 -0.570005 b 2 0.047067 d 2 0.497827 a 3 0.808312 c 3 -0.362849 d 3 -1.013164 b 4 0.784921 a 7 0.576054 c 7 0.960867 9 0.084907 dtype: float64
2.3 exchange and sort hierarchy
After exchanging internal and external indexes, sort the outer layer first and then the inner layer
a2.swaplevel().sort_index() #After exchanging internal and external indexes, sort the outer layer first and then the inner layer
1 b 0.560649 d 0.237384 2 a -0.570005 b 0.047067 d 0.497827 3 a 0.808312 c -0.362849 d -1.013164 4 b 0.784921 7 a 0.576054 c 0.960867 9 c 0.084907 dtype: float64
After exchanging internal and external indexes, sort the inner layer first and then the outer layer
a2.swaplevel().sort_index(level=1) #After exchanging internal and external indexes, sort the inner layer first and then the outer layer
2 a -0.570005 3 a 0.808312 7 a 0.576054 1 b 0.560649 2 b 0.047067 4 b 0.784921 3 c -0.362849 7 c 0.960867 9 c 0.084907 1 d 0.237384 2 d 0.497827 3 d -1.013164 dtype: float64