Hierarchical index
MultiIndex is a hierarchical index object
tup = [('beijing',2000),('beijing',2019), ('shanghai',2000),('shanghai',2019), ('guangzhou',2000),('guangzhou',2019)] values = [10000,100000,6000,60000,4000,40000] index = pd.MultiIndex.from_tuples(tup) # Generating MultiIndex with Tuples sss = pd.Series(values, index=index) # Provide a MultiIndex as an index >>> >beijing 2000 10000 2019 100000 shanghai 2000 6000 2019 60000 guangzhou 2000 4000 2019 40000
More ways to create MultiIndex include:
From the list: pd.MultiIndex.from_arrays([['a','a','b','b'],[1,2,1,2])
From tuples: pd.MultiIndex.from_tuples([('a',1),('a',2),('b',1),('b',2)])
Cartesian product: pd.MultiIndex.from_product([['a','b'],[1,2])
Direct construction: pd.MultiIndex (levels=['a','b'], [1,2]], labels=[[0,0,1,1], [0,1,0,1])
#=========================
Hierarchical indexing is very important in reshaping data and array perspective tables. For example, we can use the unstack method to rearrange data in the DataFrame, that is, to expand:
s.unstack() 1 2 3 a 0.283490 0.295529 0.277676 b 0.487573 NaN 0.091161 c 0.285157 -0.806851 NaN d NaN -0.287969 -0.696511 #-------------------------------------------------------------------------------------- s.unstack().stack() # Inverse stack a 1 0.283490 2 0.295529 3 0.277676 b 1 0.487573 3 0.091161 c 1 0.285157 2 -0.806851 d 2 -0.287969 3 -0.696511
#==================
For DataFrame objects, each axis can be hierarchically indexed, providing a multidimensional array for index or columns to hierarchize:
Advanced Hierarchical Index
DataFrame object
sort_index(level=1) means to sort the indexes at the second level.
swaplevel(0, 1) means to exchange row indexes at Layer 0 and Layer 1.
Original: state Ohio Colorado color Green Red Green key1 key2 a 1 0 1 2 b 1 6 7 8 a 2 3 4 5 b 2 9 10 11 #-------------------------------------- df.swaplevel(0, 1).sort_index(level=0) state Ohio Colorado color Green Red Green key2 key1 1 a 0 1 2 b 6 7 8 2 a 3 4 5 b 9 10 11
Indexing with columns in the DataFrame
df= pd.DataFrame({'a': range(7), 'b': range(7, 0, -1), 'c': ['one', 'one', 'one', 'two', 'two', 'two', 'two'], 'd': [0, 1, 2, 0, 1, 2, 3]}) df>>> a b c d 0 0 7 one 0 1 1 6 one 1 2 2 5 one 2 3 3 4 two 0 4 4 3 two 1 5 5 2 two 2 6 6 1 two 3
set_index(['c','d']) converts C and D columns into hierarchical row indexes
drop=False retains the original column data
reset_index is the reverse operation of set_index
df2 = df.set_index(['c','d']) df2 a b c d one 0 0 7 1 1 6 2 2 5 two 0 3 4 1 4 3 2 5 2 3 6 1 #-------------------------------------- df.set_index(['c','d'],drop=False) a b c d c d one 0 0 7 one 0 1 1 6 one 1 2 2 5 one 2 two 0 3 4 two 0 1 4 3 two 1 2 5 2 two 2 3 6 1 two 3 #------------------------------------ df2.reset_index() c d a b 0 one 0 0 7 1 one 1 1 6 2 one 2 2 5 3 two 0 3 4 4 two 1 4 3 5 two 2 5 2 6 two 3 6 1
DataFrame Index Slice
If MultiIndex is not an ordered index, most slicing operations fail! At this point, you can use the sort_index method described earlier to sort the order first.
In [19]: df = pd.DataFrame(np.arange(12).reshape((4, 3)), ...: index=[['a', 'a', 'b', 'b'], [1, 2, 1, 2]], ...: columns=[['Ohio', 'Ohio', 'Colorado'], ...: ['Green', 'Red', 'Green']]) ...: In [20]: df Out[20]: Ohio Colorado Green Red Green a 1 0 1 2 2 3 4 5 b 1 6 7 8 2 9 10 11 In [23]: df['Ohio','Colorado'] # You can't do this because the column index is hierarchical KeyError Traceback (most recent call last) --------------------------------------------------------------------------- In [24]: df[['Ohio','Colorado']] # This way Out[24]: Ohio Colorado Green Red Green a 1 0 1 2 2 3 4 5 b 1 6 7 8 2 9 10 11 #---------------------------------------- In [25]: df['Ohio','Green'] # Each layer provides a parameter Out[25]: a 1 0 2 3 b 1 6 2 9 Name: (Ohio, Green), dtype: int32 #---------------------------------------- In [26]: df.iloc[:2,:2] # Implicit indexing Out[26]: Ohio Green Red a 1 0 1 2 3 4 #---------------------------------------- In [28]: df.loc[:,('Ohio','Red')] # This is more difficult to understand. Out[28]: a 1 1 2 4 b 1 7 2 10 Name: (Ohio, Red), dtype: int32