Pandas'Learning Notes

Keywords: Attribute

Pandas'Learning Notes

Several conventions

The full text uses 1 to 3 well numbers, corresponding to the first, second and third titles of the book.
To represent notes at different levels, four well numbers are used to record formal notes.

Article directory

Getting Started

Intro to data structure

Fundamental Doctrine: Data is [essentially] aligned

Unless you do this manually, the connection between tags and data will not be disconnected.

Series

The Method of Establishing Series

s = pd.Series(data, index = index)

The data here can be dict, ndarray, scalar

import pandas as pd
import numpy as np

s = pd.Series(np.random.randn(5), index = list('abcde'))

d = dict(b = 1, a = 0, c = 2)
c  = pd.Series(d)

e = pd.Series(5. , index = list('abcde'))

The use of Series is similar to that of ndarray.

Pay attention to this usage

# After an attempt, this method can only be used in the array s of Siries and numpy of pandas, but not list.
s[[4, 3, 1]]

pandas to numpy narray

>>> e.to_numpy()      # Function to_numpy
array([0. , 0.2, 0.4, 0.6, 0.8, 1. , 1.2, 1.4, 1.6, 1.8, 2. ])
>>> e.values              # Attribute values
array([0. , 0.2, 0.4, 0.6, 0.8, 1. , 1.2, 1.4, 1.6, 1.8, 2. ])

Judging whether label exists

>>> 'e' in e
True
>>> 'l' in e

Number calling Series

s.get('f')    #If there is no'f', return None
s.get('f', np.nan)    # You can also return np.nan, which needs to be set manually.
s['f']    # This call method, if'f'does not exist, jumps out of KeyError

Vectorization and label alignment

s + s
s ** 2
np.exp(s)
s * 2

Unlike ndarray, Series automatically aligns tags when computing, and gives NaN results directly from missing elements.

The name attribute of Series

c.name = 'somename'
s = pd.Series(np.random.randn(5), name = 'another_name')

DataFrame

Acceptable inputs include: ndaray, list, dicts or Series in dict form; ndarray in two-dimensional form; ndarray in structure or record; and another DataFrame.

Data Frame has two types of tags: index (row labels); column (column labels).

Establishing DataFrame from dict Of Series

# Create a dict for Series, that is, a dict, where all the elements are Series
# Notice that the index here is essentially the original definition of index.
>>> d = dict(one = pd.Series([1., 2., 3.], index = list('abc')),
...     two = pd.Series([1., 2., 3., 4.], index = list('abcd')))
>>> d
{'one': a    1.0
b    2.0
c    3.0
dtype: float64, 'two': a    1.0
b    2.0
c    3.0
d    4.0
dtype: float64}
>>> df = pd.DataFrame(d)    # Create DataFrame
>>> df
   one  two
a  1.0  1.0
b  2.0  2.0
c  3.0  3.0
d  NaN  4.0
>>> pd.DataFrame(d, index = list('bda'))   # Note that index here is essentially a retrieval of d data. and
   one  two
b  2.0  2.0
d  NaN  4.0
a  1.0  1.0
>>> pd.DataFrame(d, index = list('dba'), columns = ['two', 'three']) # Note that columns here are also a kind of retrieval. Not definitions.
   two three
d  4.0   NaN
b  2.0   NaN
a  1.0   NaN

As can be seen from the above example, once index and columns of the DataFrame are defined, they are retrieved every time they are used unless they are modified manually.

>>> df.index   # Line number
Index(['a', 'b', 'c', 'd'], dtype='object')
>>> df.columns  # Column number
Index(['one', 'two'], dtype='object')

Create DataFrame from ndarray/list

# The ndarray used must have the same length
>>> d = dict(one = [1., 2., 3., 4.], two = [4, 3, 2, 1])
>>> d
{'one': [1.0, 2.0, 3.0, 4.0], 'two': [4, 3, 2, 1]}
>>> pd.DataFrame(d)
   one  two
0  1.0    4
1  2.0    3
2  3.0    2
3  4.0    1
# If an index is given, it also needs to be the same length as list/ndarray
>>> pd.DataFrame(d, index = list('abcd'))
   one  two
a  1.0    4
b  2.0    3
c  3.0    2
d  4.0    1

Create from structured or record array s

>>> data = np.zeros((2, ), dtype = [('A', 'i4'), ('B', 'f4'), ('C', 'a10')])
>>> data[:] = [(1, 2., 'Hello'), (2, 3., 'World')]
>>> pd.DataFrame(data)
   A    B         C
0  1  2.0  b'Hello'
1  2  3.0  b'World'
>>> pd.DataFrame(data, index = ['first', 'second'])
        A    B         C
first   1  2.0  b'Hello'
second  2  3.0  b'World'
>>> pd.DataFrame(data, columns = list('CAB'))
          C  A    B
0  b'Hello'  1  2.0
1  b'World'  2  3.0

Posted by kontesto on Fri, 13 Sep 2019 07:00:22 -0700