1.Series
Generate one-dimensional array, left index, right value:
In [3]: obj = Series([1,2,3,4,5]) In [4]: obj Out[4]: 0 1 1 2 2 3 3 4 4 5 dtype: int64 In [5]: obj.values Out[5]: array([1, 2, 3, 4, 5], dtype=int64) In [6]: obj.index Out[6]: RangeIndex(start=0, stop=5, step=1)
Create an index to mark each data point:
In [7]: obj2 = Series([4,1,9,7], index=["a","c","e","ff"]) In [8]: obj2 Out[8]: a 4 c 1 e 9 ff 7 dtype: int64 In [9]: obj2.index Out[9]: Index(['a', 'c', 'e', 'ff'], dtype='object')
Take a value or a set of values:
In [10]: obj2["c"] Out[10]: 1 In [11]: obj2[["c","e"]] Out[11]: c 1 e 9 dtype: int64
Array operation, the index will be displayed:
In [12]: obj2[obj2>3] Out[12]: a 4 e 9 ff 7 dtype: int64
Series can also be regarded as an orderly dictionary. Many dictionary operations can be used:
In [13]: "c" in obj2 Out[13]: True
To create a Series directly from a dictionary:
In [14]: data = {"name":"liu","year":18,"sex":"man"} In [15]: obj3 = Series(data) In [16]: obj3 Out[16]: name liu year 18 sex man dtype: object
To create a Series with a dictionary and a list:
In [17]: list1 = ["name","year","mobile"] In [18]: obj4 = Series(data,index=list1) In [19]: obj4 Out[19]: name liu year 18 mobile NaN dtype: object
PS: because there is no mobile in the data dictionary, the value is NaN
Check whether the data is missing:
In [20]: pd.isnull(obj4) Out[20]: name False year False mobile True dtype: bool In [21]: pd.notnull(obj4) Out[21]: name True year True mobile False dtype: bool In [22]: obj4.isnull() Out[22]: name False year False mobile True dtype: bool In [23]: obj4.notnull() Out[23]: name True year True mobile False dtype: bool
name attribute of Series:
In [7]: obj4.name = "hahaha" In [8]: obj4.index.name = "state" In [9]: obj4 Out[9]: state name liu year 18 mobile NaN Name: hahaha, dtype: object
2.DataFrame
Building DataFrame
In [13]: data = { "state":[1,1,2,1,1], "year":[2000,2001,2002,2004,2005], "pop":[1.5,1.7,3.6,2.4,2.9] } In [14]: frame = DataFrame(data) In [15]: frame Out[15]: state year pop 0 1 2000 1.5 1 1 2001 1.7 2 2 2002 3.6 3 1 2004 2.4 4 1 2005 2.9
Set the names of the rows and columns. If the data cannot be found, NA value will be generated:
In [18]: frame2 = DataFrame( data, columns=["year","state","pop","debt"], index=["one","two","three","four","five"] ) In [19]: frame2 Out[19]: year state pop debt one 2000 1 1.5 NaN two 2001 1 1.7 NaN three 2002 2 3.6 NaN four 2004 1 2.4 NaN five 2005 1 2.9 NaN
Get the columns of DataFrame as Series:
In [7]: frame2.year Out[7]: one 2000 two 2001 three 2002 four 2004 five 2005 Name: year, dtype: int64
PS: the returned index does not change, and the name property is set
Get row:
In [11]: frame2.loc["three"] Out[11]: year 2002 state 2 pop 3.6 debt NaN Name: three, dtype: object
Assignment column:
In [12]: frame2['debt'] = 16.5 In [13]: frame2 Out[13]: year state pop debt one 2000 1 1.5 16.5 two 2001 1 1.7 16.5 three 2002 2 3.6 16.5 four 2004 1 2.4 16.5 five 2005 1 2.9 16.5
If you assign a list or an array, the length needs to be equal; if you assign a Series, the index is exactly matched
In [17]: val = Series([1.2,1.5,1.7], index=["two","four","five"]) In [18]: frame2['debt'] = val In [19]: frame2 Out[19]: year state pop debt one 2000 1 1.5 NaN two 2001 1 1.7 1.2 three 2002 2 3.6 NaN four 2004 1 2.4 1.5 five 2005 1 2.9 1.7
If the column does not exist, create:
In [21]: frame2["eastern"] = frame2.state == 1 In [22]: frame2 Out[22]: year state pop debt eastern one 2000 1 1.5 NaN True two 2001 1 1.7 1.2 True three 2002 2 3.6 NaN False four 2004 1 2.4 1.5 True five 2005 1 2.9 1.7 True
For nested dictionaries, DataFrame will be interpreted as a column on the outer layer and a row index on the inner layer:
In [23]: dic = {"name":{"one":"liu","two":"rui"},"year":{"one":"23","two":"22"}} In [24]: frame3 = DataFrame(dic) In [25]: frame3 Out[25]: name year one liu 23 two rui 22
Display row, column name:
In [26]: frame3.index.name = "index" In [27]: frame3.columns.name = "state" In [28]: frame3 Out[28]: state name year index one liu 23 two rui 22
Return data in the form of 2D ndarray:
In [29]: frame3.values Out[29]: array([['liu', '23'], ['rui', '22']], dtype=object)
3. Index object
In [30]: obj = Series(range(3),index=["a","b","c"]) In [31]: index = obj.index In [32]: index Out[32]: Index(['a', 'b', 'c'], dtype='object')
The index object is immutable, so that index can be shared in multiple data structures
In [35]: index = pd.Index(np.arange(3)) In [36]: obj2 = Series([1.5,0.5,2],index=index) In [37]: obj2.index is index Out[37]: True