1, Index
Index object index in Pandas is used to store axis labels and other metadata. The index object is immutable and cannot be modified by the user.
In [73]: obj = pd.Series(range(3),index = ['a','b','c']) In [74]: index = obj.index In [75]: index Out[75]: Index(['a', 'b', 'c'], dtype='object') In [76]: index[1:] Out[76]: Index(['b', 'c'], dtype='object') In [77]: index[1] = 'f' # TypeError In [8]: index.size Out[8]: 3 In [9]: index.shape Out[9]: (3,) In [10]: index.ndim Out[10]: 1 In [11]: index.dtype Out[11]: dtype('O')
The immutability of index objects makes it safer to share index objects in a variety of data structures:
In [78]: labels = pd.Index(np.arange(3)) In [79]: labels Out[79]: Int64Index([0, 1, 2], dtype='int64') In [80]: obj2 = pd.Series([2,3.5,0], index=labels) In [81]: obj2 Out[81]: 0 2.0 1 3.5 2 0.0 dtype: float64 In [82]: obj2.index is labels Out[82]: True
The index object is essentially a container object, so you can use Python's in operation:
In [84]: f2 Out[84]: key year state pop debt order a 2000 beijing 1.5 NaN b 2001 beijing 1.7 NaN c 2002 beijing 3.6 1.0 d 2001 shanghai 2.4 2.0 e 2002 shanghai 2.9 NaN f 2003 shanghai 3.2 3.0 In [86]: 'c' in f2.index Out[86]: True In [88]: 'pop' in f2.columns Out[88]: True
And most importantly, the index object of pandas can contain duplicate labels:
In [89]: dup_lables = pd.Index(['foo','foo','bar','bar']) In [90]: dup_lables Out[90]: Index(['foo', 'foo', 'bar', 'bar'], dtype='object')
So think about it. Can DataFrame objects have duplicate columns or index es?
tolerable! But try not to! :
In [91]: f2.index = ['a']*6 In [92]: f2 Out[92]: key year state pop debt a 2000 beijing 1.5 NaN a 2001 beijing 1.7 NaN a 2002 beijing 3.6 1.0 a 2001 shanghai 2.4 2.0 a 2002 shanghai 2.9 NaN a 2003 shanghai 3.2 3.0 In [93]: f2.loc['a'] Out[93]: key year state pop debt a 2000 beijing 1.5 NaN a 2001 beijing 1.7 NaN a 2002 beijing 3.6 1.0 a 2001 shanghai 2.4 2.0 a 2002 shanghai 2.9 NaN a 2003 shanghai 3.2 3.0 In [94]: f2.columns = ['year']*4 In [95]: f2 Out[95]: year year year year a 2000 beijing 1.5 NaN a 2001 beijing 1.7 NaN a 2002 beijing 3.6 1.0 a 2001 shanghai 2.4 2.0 a 2002 shanghai 2.9 NaN a 2003 shanghai 3.2 3.0 In [96]: f2.index.is_unique # You can use this property to determine whether the index is unique Out[96]: False
index objects can also perform intersection, union, difference and XOR operations of sets, similar to Python's standard set data structure.
2, Re index
The reindex method is used to reset the new index for the panda object. This is not to modify in place, but to adjust the order by referring to the original data.
In [96]: obj=pd.Series([4.5,7.2,-5.3,3.6],index = ['d','b','a','c']) In [97]: obj Out[97]: d 4.5 b 7.2 a -5.3 c 3.6 dtype: float64
reindex will be arranged according to the new index. Indexes that do not exist will introduce missing values:
In [99]: obj2 = obj.reindex(list('abcde')) In [100]: obj2 Out[100]: a -5.3 b 7.2 c 3.6 d 4.5 e NaN dtype: float64
You can also specify the fill method parameter for the missing value. For example, fill indicates forward fill and bfill indicates backward fill
In [101]: obj3 = pd.Series(['blue','purple','yellow'],index = [0,2,4]) In [102]: obj3 Out[102]: 0 blue 2 purple 4 yellow dtype: object In [103]: obj3.reindex(range(6),method='ffill') Out[103]: 0 blue 1 blue 2 purple 3 purple 4 yellow 5 yellow dtype: object
For a two-dimensional object such as DataFrame, if only one list parameter is provided when reindex method is executed, the default is to modify the row index. You can use the keyword parameter columns to specify that the column index is modified:
In [104]: f = pd.DataFrame(np.arange(9).reshape((3,3)),index=list('acd'),columns=['beijing','shanghai','guangzhou']) In [105]: f Out[105]: beijing shanghai guangzhou a 0 1 2 c 3 4 5 d 6 7 8 In [106]: f2 = f.reindex(list('abcd')) In [107]: f2 Out[107]: beijing shanghai guangzhou a 0.0 1.0 2.0 b NaN NaN NaN c 3.0 4.0 5.0 d 6.0 7.0 8.0 In [112]: f3 = f.reindex(columns=['beijing','shanghai','xian','guangzhou']) In [113]: f3 Out[113]: beijing shanghai xian guangzhou a 0 1 NaN 2 c 3 4 NaN 5 d 6 7 NaN 8