pandas hierarchy index

Keywords: Big Data

hierarchical indexing

Next, create a Series. When entering the Index, enter a list consisting of two sub lists.

The first sublist is the outer index, and the second list is the inner index.

Example code:

import pandas as pd
import numpy as np

ser_obj = pd.Series(np.random.randn(12),index=[
                ['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c', 'd', 'd', 'd'],
                [0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2]
            ])
print(ser_obj)

Operation result:

a  0    0.099174
   1   -0.310414
   2   -0.558047
b  0    1.742445
   1    1.152924
   2   -0.725332
c  0   -0.150638
   1    0.251660
   2    0.063387
d  0    1.080605
   1    0.567547
   2   -0.154148
dtype: float64

MultiIndex index index object

  • Print the index type of this Series. The display is MultiIndex.
  • If you print the index directly, you can see that there are levels and labels. Levels represent which tags are in each of the two levels, and labels are what tags are in each position.

Example code:

print(type(ser_obj.index))
print(ser_obj.index)

Operation result:

<class 'pandas.indexes.multi.MultiIndex'>
MultiIndex(levels=[['a', 'b', 'c', 'd'], [0, 1, 2]],
           labels=[[0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3], [0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2]])

Select subset

  • Get the data according to the index. Because there are two layers of indexes now, when the data is obtained through the outer index, the labels of the outer index can be used directly.
  • When the data is to be obtained through the inner index, two elements are passed in the list, the former represents the outer index to be selected, and the latter represents the inner index to be selected.

1. Outer layer selection:

ser_obj['outer_label']

Example code:

# Outer selection
print(ser_obj['c'])

Operation result:

0   -1.362096
1    1.558091
2   -0.452313
dtype: float64

2. Selection of inner layer:

ser_obj[:, 'inner_label']

Example code:

# Selection of inner layer
print(ser_obj[:, 2])

Operation result:

a    0.826662
b    0.015426
c   -0.452313
d   -0.051063
dtype: float64

Commonly used for grouping operations, PivotTable generation, etc.

Switch hierarchy order

1. swaplevel()

. swaplevel() exchanges the inner and outer indexes.

Example code:

print(ser_obj.swaplevel())

Operation result:

0  a    0.099174
1  a   -0.310414
2  a   -0.558047
0  b    1.742445
1  b    1.152924
2  b   -0.725332
0  c   -0.150638
1  c    0.251660
2  c    0.063387
0  d    1.080605
1  d    0.567547
2  d   -0.154148
dtype: float64

Exchange and sort hierarchies

sort_index()

. sort_index() sorts the outer index first, and then the inner index. The default is ascending.

Example code:

# Exchange and sort hierarchies
print(ser_obj.swaplevel().sort_index())

Operation result:

0  a    0.099174
   b    1.742445
   c   -0.150638
   d    1.080605
1  a   -0.310414
   b    1.152924
   c    0.251660
   d    0.567547
2  a   -0.558047
   b   -0.725332
   c    0.063387
   d   -0.154148
dtype: float64

Posted by nikifi on Thu, 17 Oct 2019 07:11:38 -0700