## Hierarchical index

MultiIndex is a hierarchical index object

```tup = [('beijing',2000),('beijing',2019),
('shanghai',2000),('shanghai',2019),
('guangzhou',2000),('guangzhou',2019)]

values = [10000,100000,6000,60000,4000,40000]

index = pd.MultiIndex.from_tuples(tup) # Generating MultiIndex with Tuples

sss = pd.Series(values, index=index) # Provide a MultiIndex as an index
>>>
>beijing   2000     10000
2019    100000
shanghai   2000      6000
2019     60000
guangzhou  2000      4000
2019     40000
```

More ways to create MultiIndex include:

From the list: pd.MultiIndex.from_arrays([['a','a','b','b'],[1,2,1,2])
From tuples: pd.MultiIndex.from_tuples([('a',1),('a',2),('b',1),('b',2)])
Cartesian product: pd.MultiIndex.from_product([['a','b'],[1,2])
Direct construction: pd.MultiIndex (levels=['a','b'], [1,2]], labels=[[0,0,1,1], [0,1,0,1])

#=========================

Hierarchical indexing is very important in reshaping data and array perspective tables. For example, we can use the unstack method to rearrange data in the DataFrame, that is, to expand:

```s.unstack()

1         2         3
a  0.283490  0.295529  0.277676
b  0.487573       NaN  0.091161
c  0.285157 -0.806851       NaN
d       NaN -0.287969 -0.696511
#--------------------------------------------------------------------------------------
s.unstack().stack()  # Inverse stack

a  1    0.283490
2    0.295529
3    0.277676
b  1    0.487573
3    0.091161
c  1    0.285157
2   -0.806851
d  2   -0.287969
3   -0.696511
```

#==================

For DataFrame objects, each axis can be hierarchically indexed, providing a multidimensional array for index or columns to hierarchize:

DataFrame object
sort_index(level=1) means to sort the indexes at the second level.
swaplevel(0, 1) means to exchange row indexes at Layer 0 and Layer 1.

```Original:
color     Green Red    Green
key1 key2
a    1        0   1        2
b    1        6   7        8
a    2        3   4        5
b    2        9  10       11
#--------------------------------------
df.swaplevel(0, 1).sort_index(level=0)
color     Green Red    Green
key2 key1
1    a        0   1        2
b        6   7        8
2    a        3   4        5
b        9  10       11
```

Indexing with columns in the DataFrame

```df= pd.DataFrame({'a': range(7), 'b': range(7, 0, -1),
'c': ['one', 'one', 'one', 'two', 'two', 'two', 'two'],
'd': [0, 1, 2, 0, 1, 2, 3]})
df>>>
a  b    c  d
0  0  7  one  0
1  1  6  one  1
2  2  5  one  2
3  3  4  two  0
4  4  3  two  1
5  5  2  two  2
6  6  1  two  3
```

set_index(['c','d']) converts C and D columns into hierarchical row indexes
drop=False retains the original column data
reset_index is the reverse operation of set_index

``` df2 = df.set_index(['c','d'])

df2
a  b
c   d
one 0  0  7
1  1  6
2  2  5
two 0  3  4
1  4  3
2  5  2
3  6  1
#--------------------------------------
df.set_index(['c','d'],drop=False)

a  b    c  d
c   d
one 0  0  7  one  0
1  1  6  one  1
2  2  5  one  2
two 0  3  4  two  0
1  4  3  two  1
2  5  2  two  2
3  6  1  two  3
#------------------------------------
df2.reset_index()

c  d  a  b
0  one  0  0  7
1  one  1  1  6
2  one  2  2  5
3  two  0  3  4
4  two  1  4  3
5  two  2  5  2
6  two  3  6  1
```

DataFrame Index Slice

If MultiIndex is not an ordered index, most slicing operations fail! At this point, you can use the sort_index method described earlier to sort the order first.

```  In [19]: df = pd.DataFrame(np.arange(12).reshape((4, 3)),
...:             index=[['a', 'a', 'b', 'b'], [1, 2, 1, 2]],
...:             ['Green', 'Red', 'Green']])
...:

In [20]: df
Out[20]:
Green Red    Green
a 1     0   1        2
2     3   4        5
b 1     6   7        8
2     9  10       11

In [23]: df['Ohio','Colorado']  # You can't do this because the column index is hierarchical
KeyError                                  Traceback (most recent call last)
---------------------------------------------------------------------------
In [24]: df[['Ohio','Colorado']]  # This way
Out[24]:
Green Red    Green
a 1     0   1        2
2     3   4        5
b 1     6   7        8
2     9  10       11
#----------------------------------------
In [25]: df['Ohio','Green']  # Each layer provides a parameter
Out[25]:
a  1    0
2    3
b  1    6
2    9
Name: (Ohio, Green), dtype: int32
#----------------------------------------
In [26]: df.iloc[:2,:2]  # Implicit indexing
Out[26]:
Ohio
Green Red
a 1     0   1
2     3   4
#----------------------------------------
In [28]: df.loc[:,('Ohio','Red')] # This is more difficult to understand.
Out[28]:
a  1     1
2     4
b  1     7
2    10
Name: (Ohio, Red), dtype: int32
```

Posted by lordzardeck on Thu, 10 Oct 2019 01:08:12 -0700