Data processing and visualization -- Fundamentals of Numpy

Data processing and visualization (I) -- Numpy Foundation

1 background

Although the list can complete array operation, it is not an array in the real sense. When the amount of data is large, its speed is very slow, so NumPy extension library is provided to complete array operation. Many advanced extension libraries also rely on it, such as Scipy, Pandas and Matplotlib. NumPy provides two basic objects: ndarray (N-dimensional Array Object) and ufunc (Universal Function Object). Ndarray (called array, hereinafter uniformly called array) is a multidimensional array that stores a single data type, and ufunc is a general function that can process arrays.

2 examples

The dimension of NumPy array is called rank. Rank is the number of axes, that is, the dimension of the array. The rank of one-dimensional array is 1, the rank of two-dimensional array is 2, and so on.

In NumPy, each linear array is called an axis, that is, dimensions. For example, a two-dimensional array is equivalent to two one-dimensional arrays, in which each element in the first one-dimensional array is a one-dimensional array. Therefore, a one-dimensional array is the axis in NumPy. The first axis is equivalent to the underlying array, and the second axis is the array in the underlying array. The number of axes, the rank, is the dimension of the array.

In many cases, you can declare axis. axis=0, which means to operate along axis 0, that is, to operate each column; axis=1 indicates the operation along the first axis, that is, the operation is performed on each line.

2.1 creating arrays

First, import the numpy module. If an error is reported, you need to install the numpy library through pip install numpy. Let's illustrate how to create an array through several examples

import numpy as np
a=np.array(([1,2],[2,3])) #Array: array. The array() function can convert any sequence type in Python to an array of ndarray s.
b1=np.arange(1,10,1)#The meaning of the parameter in () is: [start, end (excluding)), step size
b2=np.linspace(1,10,5)#Create a function of one-dimensional array in the form of equal difference sequence. The parameter meanings are: start, end (including) and total number 
c=np.zeros((2,2))#0 matrix
d=np.ones((3,2),dtype="int")#All 1 matrix
e=np.eye(3)#Identity matrix
f=np.empty((2,4))#Creates an uninitialized array of the specified shape and data type
g=np.random.rand(2,2)#Generate a random sample value that obeys [0,1) uniform distribution
                     #Sometimes a tuple is passed, sometimes multiple parameters are passed 
                     #True random, non normal distribution, recommendation 
                     #0 ~ 1 distribution does not include 1
                     #When multiple parameters are used to represent dimensions, the parameters can only be dimensions. When tuples are used to represent dimensions, there can be other parameters (ep: dtye)

Difference between range and np.range

The return types of range() and NP. Range() are different. Range() returns range; object, while NP. Range() returns ndarray type. Range() does not support decimal steps, while NP. Range() supports decimal steps. Both range() and NP. Range() can be used for iteration; range() and NP. Range()) There are three parameters. The first parameter is the starting point and the third parameter is the step size. The data sequence that does not include the second parameter up to the second parameter. Range() can be used for iteration, and NP. Range has more than that. It is a sequence and can be used as a vector.

2.2 setting random seeds

np.random.seed() can set random seeds, which is also very useful in practice, and can be used to reproduce experimental results.

np.random.seed(0) #Set random seed to 0
np.random.rand() #In this way, the random number generated each time is fixed

2.3 rounding operation

The following describes three common rounding operations, np.ceil, np.floor, and np.round

g1=np.ceil(np.random.rand()*5)#Round up
                              #Etc. may generate 1-5 
                              #Don't use it round，Thinking reasons,Random number range[0~1)#Round up
g2=np.floor(np.random.rand(2,2)*3)#Round down
g3=np.round(0.5)#The syntax of this sentence is np.round(x,[,n])x is a numeric expression, and N is a numeric expression, indicating the number of decimal places. Its purpose is to return the rounded value of floating-point number X. when parameter n does not exist, the output of the round() function is an integer. When parameter n exists, even if it is 0, round() The output of the function will also be a floating-point number. In addition, the value of N can be negative, indicating that it is rounded in the integer part, but the result is still a floating-point number.
g4=np.round(1.5)#Rounding follows the even number principle. If the integer part is even, it will be rounded off when encountering. 5. If the integer part is odd, it will be rounded up when encountering. 5. For example, 2.5 will become 2 and 3.5 will become 3

2.4 array operation

The default operation in python is element wise, which is illustrated by the following examples

#Display and understanding of common features
h=np.array(([2,3],[1,3]))
h1=h*3#For each number * 3
h2=h*h#The number of corresponding positions is multiplied, so the shape is required to be the same, not matrix multiplication, but point multiplication
h3=np.dot(h,h)#inner product
h4=h.dot(h)#inner product
h5=np.sin(h)#Apply this function to every number in the matrix

2.5 light copy and deep copy

h6=np.array((1,2))
h7=h6#Directly equals to a shallow copy without calculation, which is equivalent to a small name and points to the same object, which can be understood by contacting the pointer
h8=h6.copy()#It can also be expressed as np.copy(h6)
            #Deep copy, independent of each other
h6+=1#h7 is also + 1 in shallow copy, but it does not change in deep copy h8

2.6 common attributes

The following describes the reading of common array properties

#Common attributes can be read, displayed, used and queried
i=np.size(d)#Size, = number of all elements
it=d.size #Another way is to view the total number of array elements
i1=np.size(d,0)#The size of the first parameter during generation, i.e. "number of rows"
i2=np.size(d,1)#Number of columns
i3=np.shape(d)#Dimension of array
i4=d.dtype#There are both attributes and functions under class to understand the knowledge of class
i5=np.ndim(d)#dim is the abbreviation of dimension dimension in English. Therefore, for an array, the length of its shape attribute is also its ndim

2.7 array deformation

Common array deformation operations include np.reshape and np.resize, which have different effects, as illustrated by the following example:

#Common operation display
#deformation
j=np.linspace(1,10,10)#Generating an arithmetic sequence
j1=j.reshape(2,5)#Do not change the original array, return the deformed array, - 1 usage, but it must be divisible
j2=np.linspace(1,6,6)
j_empty=j2.resize(3,2)#Change the original array and return null
j3=np.resize(j,(5,2))#Other writing

2.8 addition, deletion, modification and query

The following describes the operations of adding, deleting, modifying and querying arrays.

#Add, delete, modify and query
k=np.array([[1,3,2,4,5],[1,1,1,1,1]])#Subsequent operations will not change k
k1=np.append(k,99) #append will pull the data into one dimension

k2=np.insert(k,3,99)
#The usage is to insert a vector into a row or column. The syntax is: numpy.insert(arr, obj, values, axis=None)
#If the fourth parameter (axis) is not specified, it will be flattened and then inserted. If a multi-dimensional only inserts a number, it will be flattened and then inserted
#What if it is three-dimensional to two-dimensional? A number is leveled by the broadcast mechanism, but if it is 3, it needs 2, it can't, and an error will be reported

k3=np.insert(k,0,99,axis=0)#Insert by line. If there is only one insertion value, the changed line is all equal to this value. The broadcast mechanism
k4=np.insert(k,4,[99,66],axis=1)#Insert by column, or directly specify all inserted values in the form of tuples or lists

k5=np.delete(k,3)#Similarly, if you don't specify axis, you first flatten it. k is the original and returns a different thing
k6=np.delete(k,1,0)#When specified, delete an entire row / column

2.9 index of array

The following is a few examples to illustrate the index of an array

 #ka is one dimension and kb is multi dimension
ka=np.arange(1,10,1)
kb=np.linspace(-np.pi,np.pi,12).reshape(3,4)
ka1=ka[1:3]# Include 1, not 3
ka2=ka[5:] # Index from 5 to last
ka3=ka[:-3] # -3 stands for the penultimate
ka4=ka[1:10:2]#By default, the first and second parameters are selected to end at one end by default. By default, the third parameter has a default step size of 1. The step size can be negative, closed on the left and open on the right, followed by the step size
ka5=ka[::-1]#reverse
ka6=ka[[1,2,4,7]]#Select the number 2, 3, 5 and 8. Note: the parameter must be passed in the form of list, otherwise it will be understood as multi-dimensional data retrieval and an error will be reported
kb1=kb[1]#For multidimensional arrays, the first parameter is selected by row
kb2=kb[:,2]#This is selected by column
kb3=kb[1,2]#Select a specific array
kb4=kb[1:3,2:4]#Select the array contained in a region and don't forget to close left and open right

2.10 data filtering

In the actual data processing, it is very important to filter the data. For example, select the number greater than a certain threshold in a column of data. One way is to traverse all the data and run a cycle. However, python provides a more convenient way to write it in one line of code. We illustrate it with the following example.

kc=np.linspace(-10,10,10).reshape(2,5)
print(f'kc={kc}')
kc1=kc[kc>3] # Filter out data greater than 3
kc2=kc[kc*2>5] # It is equivalent to filtering out data greater than 2.5
print(f'kc1 = {kc1}')
print(f'kc2 = {kc2}')

kc=[[-10.          -7.77777778  -5.55555556  -3.33333333  -1.11111111]
 [  1.11111111   3.33333333   5.55555556   7.77777778  10.        ]]
kc1 = [ 3.33333333  5.55555556  7.77777778 10.        ]
kc2 = [ 3.33333333  5.55555556  7.77777778 10.        ]

Now let's understand how python does this.

kc3=kc/3<-1 #bool matrix is given directly, which can be used to take values directly and other matrices with the same shape
print(f'kc3 = {kc3}') # kc3 is the same size as kc, and all elements are True or False
kc4=kc[kc3] #Take out the True part

kd=kc**2 #**Indicates that the power is^
kd1=kd[kc3]

kc3 = [[ True  True  True  True  True]
 [ True  True False False False]
 [False False False False False]
 [False False False False False]]

2.11 splicing and splitting

It is also very important to splice and split arrays. python has many multi functions to complete this function. Let's briefly introduce two: np.concatenate and np.split

la=np.arange(1,7,1).reshape(2,3)
lb=-la
l1=np.concatenate((la,lb),0)#Function connects multiple arrays on a specified axis. It takes a sequence of arrays as parameters and concatenates them into an array. 
                            # axis=0: splice by row, = 1 splice by column
                            #Its standard syntax is: numpy.concatenate((a1, a2,...),axis= 0, out= None) 
l2=np.concatenate((la,lb),axis=1)#1: By column
l3=np.concatenate((la,la,la,lb),axis=1)#concatenate is true and versatile. You can splice multiple at the same time. Just remember this one

l4=np.split(la,3,axis=1)#Split la into 3 parts by column. Its standard syntax is np.split(ary,indices_or_sections, axis=0)
#The function is to cut an array in order from left to right
print(f'la = {la}')
print(f'l4[0] = {l4[0]}')
print(f'l4[1] = {l4[1]}')
print(f'l4[2] = {l4[2]}')

la = [[1 2 3]
 [4 5 6]]
l4[0] = [[1]
 [4]]
l4[1] = [[2]
 [5]]
l4[2] = [[3]
 [6]]

3 Summary

This section only introduces some basic data processing operations of numpy. In fact, numpy library is very powerful and can complete many operations. More importantly, it can be used and checked when necessary.

If you want to know more about the numpy library, you can check it out Official documents , and Chinese tutorial.

Posted by itarun on Tue, 12 Oct 2021 16:15:04 -0700

Programmer Group