pandas hierarchy index

Keywords: Python pandas

pandas notes 007

7, Hierarchical index

import pandas as pd
import numpy as np

1. Hierarchical indexing

1.1 Series

When creating a Series, use Index to specify the internal and external indexes. The first inner list is the outer Index, and the second inner list is the inner Index.

data = pd.Series(np.random.randn(9),
                index=[['a', 'a', 'a', 'b', 'b', 'c', 'c', 'd', 'd'], #Outer index
                [1, 2, 3, 1, 3, 1, 2, 2, 3]  #Inner index
                      ])
data
a  1   -1.745267
   2    0.749512
   3    0.891167
b  1    0.894595
   3   -0.978024
c  1   -0.365535
   2   -0.445463
d  2   -0.041103
   3    0.460878
dtype: float64

View hierarchy index types and index objects

print(type(data.index))  #The hierarchical index type is MultiIndex
print("="*50)
print(data.index) #View the index object. The first element in parentheses is the outer index, and the second element is the inner index
<class 'pandas.core.indexes.multi.MultiIndex'>
==================================================
MultiIndex([('a', 1),
            ('a', 2),
            ('a', 3),
            ('b', 1),
            ('b', 3),
            ('c', 1),
            ('c', 2),
            ('d', 2),
            ('d', 3)],
           )

The data is obtained according to the outer index, and all outer indexes are a values (including inner indexes).

data['a']
1   -1.745267
2    0.749512
3    0.891167
dtype: float64

The outer index slice value obtains the value of the outer index ranging from b to C (including b and c) (including the inner index)

data['b':'c']
b  1    0.894595
   3   -0.978024
c  1   -0.365535
   2   -0.445463
dtype: float64

The internal and external indexes take values together, separated by commas.

data['b',3]    # Get the value with outer index b and inner index 3
-0.9780239352546295

Use the advanced index loc (label index) value

data.loc[['b','c']]
b  1    0.894595
   3   -0.978024
c  1   -0.365535
   2   -0.445463
dtype: float64

Get all the data with the outer index and the inner index of 2

data.loc[:,2]
a    0.749512
c   -0.445463
d   -0.041103
dtype: float64
1.2 DataFrame

Create DataFrame

frame = pd.DataFrame({'a': range(7), 'b': range(7, 0, -1),
                    'c': ['one', 'one', 'one', 'two', 'two','two', 'two'],
                    'd': [0, 1, 2, 0, 1, 2, 3]})
frame
	a	b	c	d
0	0	7	one	0
1	1	6	one	1
2	2	5	one	2
3	3	4	two	0
4	4	3	two	1
5	5	2	two	2
6	6	1	two	3

Create a hierarchical index:

Make two columns as internal and external indexes and delete them

frame2 = frame.set_index(['c','d'])   #Take columns C and D as internal and external indexes and delete them
frame2
		a	b
c	d		
one	0	0	7
	1	1	6
	2	2	5
two	0	3	4
	1	4	3
	2	5	2
	3	6	1

Converting a hierarchical index to a generic DataFrame index

frame2.reset_index()
	c	d	a	b
0	one	0	0	7
1	one	1	1	6
2	one	2	2	5
3	two	0	3	4
4	two	1	4	3
5	two	2	5	2
6	two	3	6	1

Take columns C and D as internal and external indexes and keep these two columns

frame.set_index(['c','d'],drop=False)
		a	b	c	d
c	d				
one	0	0	7	one	0
	1	1	6	one	1
	2	2	5	one	2
two	0	3	4	two	0
	1	4	3	two	1
	2	5	2	two	2
	3	6	1	two	3

2. Exchange

2.1 exchange layering sequence
a1 = pd.Series(np.random.randn(12),index=[
                ['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c', 'd', 'd', 'd'], #Outer index
                [0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2]   #Inner index
            ])
a1
a  0   -0.864526
   1   -1.198998
   2   -0.208866
b  0    1.130544
   1    0.958730
   2    0.548593
c  0    0.283399
   1    0.289012
   2   -1.575808
d  0   -0.094977
   1    0.639439
   2   -0.692990
dtype: float64

Swap level() exchanges inner and outer indexes

a1.swaplevel() #Exchange inner and outer indexes
0  a   -0.864526
1  a   -1.198998
2  a   -0.208866
0  b    1.130544
1  b    0.958730
2  b    0.548593
0  c    0.283399
1  c    0.289012
2  c   -1.575808
0  d   -0.094977
1  d    0.639439
2  d   -0.692990
dtype: float64
2.2 sorting and layering
  • sortlevel() sorts the outer index first, and then the inner index. The default is ascending. But sortlevel () is no longer supported. Sort_ The function of index () is the same as sortlevel ().

  • sort_index()

    Default level = 0 (sort the outer index first, and then sort the inner index if the outer index is the same),

    Level = 1 (sort the inner index first. If the inner indexes are the same, then sort the outer index).

a2 = pd.Series(np.random.randn(12),index=[
                [ 'b', 'b', 'b','a', 'a','a', 'd', 'd', 'd', 'c', 'c', 'c'], #Outer index
                [4, 1, 2, 7, 3, 2, 1, 3, 2,7, 3, 9,]   #Inner index
            ])
a2
b  4    0.784921
   1    0.560649
   2    0.047067
a  7    0.576054
   3    0.808312
   2   -0.570005
d  1    0.237384
   3   -1.013164
   2    0.497827
c  7    0.960867
   3   -0.362849
   9    0.084907
dtype: float64

Default level = 0 (sort the outer index first, and then sort the inner index if the outer index is the same)

a2.sort_index() 
a  2   -0.570005
   3    0.808312
   7    0.576054
b  1    0.560649
   2    0.047067
   4    0.784921
c  3   -0.362849
   7    0.960867
   9    0.084907
d  1    0.237384
   2    0.497827
   3   -1.013164
dtype: float64

Specify level = 1 (sort the inner index first, and then sort the outer index if the inner indexes are the same)

a2.sort_index(level=1) 
b  1    0.560649
d  1    0.237384
a  2   -0.570005
b  2    0.047067
d  2    0.497827
a  3    0.808312
c  3   -0.362849
d  3   -1.013164
b  4    0.784921
a  7    0.576054
c  7    0.960867
   9    0.084907
dtype: float64
2.3 exchange and sort hierarchy

After exchanging internal and external indexes, sort the outer layer first and then the inner layer

a2.swaplevel().sort_index()  #After exchanging internal and external indexes, sort the outer layer first and then the inner layer
1  b    0.560649
   d    0.237384
2  a   -0.570005
   b    0.047067
   d    0.497827
3  a    0.808312
   c   -0.362849
   d   -1.013164
4  b    0.784921
7  a    0.576054
   c    0.960867
9  c    0.084907
dtype: float64

After exchanging internal and external indexes, sort the inner layer first and then the outer layer

a2.swaplevel().sort_index(level=1)  #After exchanging internal and external indexes, sort the inner layer first and then the outer layer
2  a   -0.570005
3  a    0.808312
7  a    0.576054
1  b    0.560649
2  b    0.047067
4  b    0.784921
3  c   -0.362849
7  c    0.960867
9  c    0.084907
1  d    0.237384
2  d    0.497827
3  d   -1.013164
dtype: float64

Posted by Poomerio on Sun, 31 Oct 2021 11:07:46 -0700