Pandas'Learning Notes
Several conventions
The full text uses 1 to 3 well numbers, corresponding to the first, second and third titles of the book.
To represent notes at different levels, four well numbers are used to record formal notes.
Article directory
Getting Started
Intro to data structure
Fundamental Doctrine: Data is [essentially] aligned
Unless you do this manually, the connection between tags and data will not be disconnected.
Series
The Method of Establishing Series
s = pd.Series(data, index = index)
The data here can be dict, ndarray, scalar
import pandas as pd import numpy as np s = pd.Series(np.random.randn(5), index = list('abcde')) d = dict(b = 1, a = 0, c = 2) c = pd.Series(d) e = pd.Series(5. , index = list('abcde'))
The use of Series is similar to that of ndarray.
Pay attention to this usage
# After an attempt, this method can only be used in the array s of Siries and numpy of pandas, but not list. s[[4, 3, 1]]
pandas to numpy narray
>>> e.to_numpy() # Function to_numpy array([0. , 0.2, 0.4, 0.6, 0.8, 1. , 1.2, 1.4, 1.6, 1.8, 2. ]) >>> e.values # Attribute values array([0. , 0.2, 0.4, 0.6, 0.8, 1. , 1.2, 1.4, 1.6, 1.8, 2. ])
Judging whether label exists
>>> 'e' in e True >>> 'l' in e
Number calling Series
s.get('f') #If there is no'f', return None s.get('f', np.nan) # You can also return np.nan, which needs to be set manually. s['f'] # This call method, if'f'does not exist, jumps out of KeyError
Vectorization and label alignment
s + s s ** 2 np.exp(s) s * 2
Unlike ndarray, Series automatically aligns tags when computing, and gives NaN results directly from missing elements.
The name attribute of Series
c.name = 'somename' s = pd.Series(np.random.randn(5), name = 'another_name')
DataFrame
Acceptable inputs include: ndaray, list, dicts or Series in dict form; ndarray in two-dimensional form; ndarray in structure or record; and another DataFrame.
Data Frame has two types of tags: index (row labels); column (column labels).
Establishing DataFrame from dict Of Series
# Create a dict for Series, that is, a dict, where all the elements are Series # Notice that the index here is essentially the original definition of index. >>> d = dict(one = pd.Series([1., 2., 3.], index = list('abc')), ... two = pd.Series([1., 2., 3., 4.], index = list('abcd'))) >>> d {'one': a 1.0 b 2.0 c 3.0 dtype: float64, 'two': a 1.0 b 2.0 c 3.0 d 4.0 dtype: float64} >>> df = pd.DataFrame(d) # Create DataFrame >>> df one two a 1.0 1.0 b 2.0 2.0 c 3.0 3.0 d NaN 4.0 >>> pd.DataFrame(d, index = list('bda')) # Note that index here is essentially a retrieval of d data. and one two b 2.0 2.0 d NaN 4.0 a 1.0 1.0 >>> pd.DataFrame(d, index = list('dba'), columns = ['two', 'three']) # Note that columns here are also a kind of retrieval. Not definitions. two three d 4.0 NaN b 2.0 NaN a 1.0 NaN
As can be seen from the above example, once index and columns of the DataFrame are defined, they are retrieved every time they are used unless they are modified manually.
>>> df.index # Line number Index(['a', 'b', 'c', 'd'], dtype='object') >>> df.columns # Column number Index(['one', 'two'], dtype='object')
Create DataFrame from ndarray/list
# The ndarray used must have the same length >>> d = dict(one = [1., 2., 3., 4.], two = [4, 3, 2, 1]) >>> d {'one': [1.0, 2.0, 3.0, 4.0], 'two': [4, 3, 2, 1]} >>> pd.DataFrame(d) one two 0 1.0 4 1 2.0 3 2 3.0 2 3 4.0 1 # If an index is given, it also needs to be the same length as list/ndarray >>> pd.DataFrame(d, index = list('abcd')) one two a 1.0 4 b 2.0 3 c 3.0 2 d 4.0 1
Create from structured or record array s
>>> data = np.zeros((2, ), dtype = [('A', 'i4'), ('B', 'f4'), ('C', 'a10')]) >>> data[:] = [(1, 2., 'Hello'), (2, 3., 'World')] >>> pd.DataFrame(data) A B C 0 1 2.0 b'Hello' 1 2 3.0 b'World' >>> pd.DataFrame(data, index = ['first', 'second']) A B C first 1 2.0 b'Hello' second 2 3.0 b'World' >>> pd.DataFrame(data, columns = list('CAB')) C A B 0 b'Hello' 1 2.0 1 b'World' 2 3.0