Guide map of this section: https://www.processon.com/view/link/5fcc5e81f346fb3fc8776929
1.
1. ndarray object
1.1 why ndarray
list problem
The element of Python list can be any object, and the pointer of the object is saved in the list. For example, storing [1, 2, 3] requires three pointers and three integer objects, which wastes memory and CPU for numerical operations.
array module
Python also provides an array module, which can directly save values (rather than objects), but it does not support multi-dimensional arrays and lacks rich operation functions.
ndarray(N-dimensional array)
ndarray is an n-dimensional array, which makes up for the above shortcomings. It provides the following objects:
- ndarray object: stores multidimensional arrays of a specific type
- ufunc function: a function that handles arrays
Ndarray is the core object of Numpy. All functions in Numpy are processed around ndarray.
1.2 creating an ndarray object
Let's import numpy first
import numpy as np
You can create arrays from lists
a1 = np.array([1, 2, 3]) a1
array([1, 2, 3])
a2 = np.array([[4, 5, 6], [7, 8, 9]]) a2
array([[4, 5, 6], [7, 8, 9]])
You can view the number of rows and columns of the array through the shape attribute
print(a1.shape) print(a2.shape)
(3,) (2, 3)
You can use the reshape() method to create a new array of specific shapes
a3 = a2.reshape((3, 2)) a3
array([[4, 5], [6, 7], [8, 9]])
The array obtained after reshape is shared with the original array. We can try to modify it
a3[0,0] = 0 # All have been modified print(a3) print(a2)
[[0 5] [6 7] [8 9]] [[0 5 6] [7 8 9]]
You can view the element type of the array through the dtype attribute
a1.dtype
dtype('int32')
The array created through the integer list. The default dtype is int64 in 64 bit system and int32 in 32-bit system
You can declare dtype when you create an array
a4 = np.array([1, 2, 3], dtype=np.int32) a4
array([1, 2, 3])
You can use the astype() method for type conversion
a4.astype(np.float32) # Convert dtype int32 to float32
array([1., 2., 3.], dtype=float32)
1.3 creating ndarray objects through some functions
It is obviously inefficient to create arrays through lists. numpy provides many functions specially used to create arrays.
Create an array of equal differences using range()
Similar to the built-in function range(), the range() function specifies the start value, end value and step size. The function returns an equal difference one-dimensional array.
np.arange(0, 10, 1)
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Note that the termination value is not included here
You can specify a specific shape by following the reshape() method
np.arange(0, 10, 1).reshape((5, 2))
array([[0, 1], [2, 3], [4, 5], [6, 7], [8, 9]])
Create an isometric array using linspace()
The linspace() function specifies the start value, the end value and the number of elements, and returns an equal difference array.
np.linspace(1, 10, 10) # Divide 1 to 10 into 10 equal parts
array([ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.])
Note that the termination value is included here. You can set the parameter endpoint=False to exclude termination values
np.linspace(1, 10, 10, endpoint=False)
array([1. , 1.9, 2.8, 3.7, 4.6, 5.5, 6.4, 7.3, 8.2, 9.1])
Create an isometric array using logspace()
Similar to the linspace() function, the logspace() function also specifies the start value, end value and number of elements, but it needs to pass in the common ratio parameter base, which is 10 by default
np.logspace(0, 5, 5, base=2, endpoint=False)
array([ 1., 2., 4., 8., 16.])
Note that the starting and ending values here are actually powers of the common ratio 2
Use zeros() ones() full() to initialize a specific value
np.zeros((2, 3), dtype=np.int32) # Initialize the array with shape (2,3) and all elements are 0 of int32
array([[0, 0, 0], [0, 0, 0]])
np.ones((3, 2), dtype=np.float32) # Initialize the array with shape (3,2) and all elements are 1 of float32
array([[1., 1.], [1., 1.], [1., 1.]], dtype=float32)
np.full(4, 10, dtype=np.float64) # Initialize the array whose shape is (4,) and all elements are 10 of float64
array([10., 10., 10., 10.])
1.4 access elements
Use index access
Access the array as you access the list
a = np.arange(10) # Define an array a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
print(a[5]) # Get the 5th element of the array (starting from 0)
5
print(a[:5]) # Get the first 5 elements of the array (excluding the fifth) print(a[2:5]) # Get the 2nd to 5th elements of the array (including the 2nd and excluding the 5th) print(a[:-1]) # Get the elements from the beginning to the end of the array (excluding the last one), - 1 represents the penultimate element
[0 1 2 3 4] [2 3 4] [0 1 2 3 4 5 6 7 8]
print(a[1:9:2]) # From 1 to 9, take one element every 2 steps
[1 3 5 7]
a[1] = 11 # Replace the first element print(a)
[ 0 11 2 3 4 5 6 7 8 9]
a[0:2] = 100, 101 # Replace the first 2 elements print(a)
[100 101 2 3 4 5 6 7 8 9]
The new array obtained by the slice is a view of the original array, that is, it shares memory with the original array
b = np.arange(5) # Definition b print(b) c = b[:3] # Slice to obtain c print(c) c[0] = 100 # Change c print(c) print(b) # b has also been modified
[0 1 2 3 4] [0 1 2] [100 1 2] [100 1 2 3 4]
Here, unlike the list slice, the list slice returns a new list
b = list(range(5)) # Definition b print(b) c = b[:3] # Slice to obtain c print(c) c[0] = 100 # Change c print(c) print(b) # b has not been modified
[0, 1, 2, 3, 4] [0, 1, 2] [100, 1, 2] [0, 1, 2, 3, 4]
Access to multidimensional arrays is represented by multiple values
# Define 2D array a a = np.arange(10).reshape((-1, 2)) # -1 stands for automatic calculation of shape[0], where the shape is (5,2) a
array([[0, 1], [2, 3], [4, 5], [6, 7], [8, 9]])
print(a[1,:]) # Get line 1 print(a[:,1]) # Get column 1 print(a[1:3, 1]) # Rows 1 to 3, column 1
[2 3] [1 3 5 7 9] [3 5]
You can use arrays to access elements
In addition to using indexes and slicing, we can also use list s or arrays to access elements
a = np.arange(100, 105) # Definition a a
array([100, 101, 102, 103, 104])
a[[0,2,4]] # Using the list, get the elements according to the index declared in the list and return a new array
array([100, 102, 104])
a[np.array([0,2,4])] # Use an array to get elements by index
array([100, 102, 104])
a[np.array([True, False, False, True, True])] # Use bool array to return the array whose corresponding position is True
array([100, 103, 104])
The array obtained by array is a copy of the original array
b = a[np.array([0,2,4])] # Get array b from array print(b) b[0] = 0 # Change the element of b print(b) # b modified print(a) # a has not been modified
[100 102 104] [ 0 102 104] [100 101 102 103 104]
Comparison of slice and array access elements
- Array the array obtained by slicing is the view of the original array and shares memory
- Array an array obtained by (integer or bool) array, which is a copy of the original array
- The reason for the above situation is related to the specific implementation of the memory structure of the ndarray object, which is not expanded here;
1.5 ufunc operation
ufunc is the abbreviation of universal function. It is a function that can operate on each element of an array.
Four arithmetic
The addition, subtraction, multiplication, division and other operations between arrays can be realized by Python's built-in + - * / and other operators, or by using functions such as add() and subtract(), which are equivalent.
Refer to the following figure for complete operators:
[the external chain image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-2Awk1LOS-1637239987972)(images / array operator ufun function. png)]
# Define two arrays a = np.arange(10) b = np.full(10, -1) print(a) print(b)
[0 1 2 3 4 5 6 7 8 9] [-1 -1 -1 -1 -1 -1 -1 -1 -1 -1]
a + b # Use operator+
array([-1, 0, 1, 2, 3, 4, 5, 6, 7, 8])
np.multiply(a, b) # Use function multiplication
array([ 0, -1, -2, -3, -4, -5, -6, -7, -8, -9])
Comparison operation
Similarly, the comparison operation between arrays can also be realized through the operator > = < and so on, which is equivalent to using functions such as equal() less(). It returns a Boolean array.
The complete operator reference is as follows:
[the external chain image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-M99MLXoA-1637239987978)(images / comparison operator ufunc function. png)]
# Define two random arrays a = np.random.rand(5) b= np.random.rand(5) print(a) print(b)
[0.45512267 0.54405151 0.24847556 0.90709983 0.21287093] [0.07250248 0.85389154 0.0598896 0.58928252 0.47854526]
a > b # Use operator >
array([ True, False, True, True, False])
np.less_equal(a, b) # Using the function less_equal
array([False, True, False, False, True])
Boolean operation
Boolean operations are implemented in python with the and or not keyword, and operator overloading cannot be implemented. Therefore, there is no operator, so only the function logical can be used_ and() logical_ Or () and so on.
What is operator overloading? You can use it to define (or redefine) the + - * / and other operations of two objects. The operations of these operators are essentially function calls, which you can implement in your class__ add__ () built-in function, and then your class has the function of + addition. Not here. For interested students, please refer to: https://zhuanlan.zhihu.com/p/162931696
# Define two bool arrays a = np.array([True, False, False]) b = np.array([True, True, False]) print(a) print(b)
[ True False False] [ True True False]
print(np.logical_and(a, b)) # Logic and print(np.logical_or(a, b)) # Logical or print(np.logical_not(a)) # Logical non print(np.logical_xor(a, b)) # Logical XOR
[ True False False] [ True True False] [False True True] [False True False]
1.6 broadcasting
The ufunc function mentioned above processes multiple arrays with the same shape. If the array shapes to be processed are different, the following broadcast processing will be performed:
- Let all input arrays align with the array with the longest shape, and the insufficient parts in the shape are supplemented by 1 in front;
- The shape of the output array is the maximum value in each dimension of the input array shape;
- If a dimension of the input array and the corresponding dimension of the output array have the same length or their length is 1, the array can be used for calculation, otherwise an error will occur;
- When the length of a dimension of the input array is 1, the first set of values in this dimension will be used when operating along this dimension.
It looks abstract. You can understand these four rules through two examples.
Example 1
arr0 = np.array([ # 2-dimensional array, shape is (2,3) [1, 2, 3], [4, 5, 6] ]) arr2 = np.array([10, 20, 30]) # 1-dimensional array, shape is (3,) print(arr0.shape) print(arr2.shape)
(2, 3) (3,)
arr_sum = arr0 + arr2 # Broadcast occurs and the sum of the two arrays is calculated print(arr_sum) print(arr_sum.shape) # Output shape: according to rule 1, arr2 is supplemented as (1,3); According to rule 2, the output shape is (max(2,1),max(3,3)) = (2,3)
[[11 22 33] [14 25 36]] (2, 3)
We already know that the output shape is (2,3). How to calculate each element?
The shape of arr0 is (2,3), the shape of arr2 is (1,3),
- According to rule 3, these two arrays can be calculated (dimension 2 is equal, dimension 1 is unequal, but arr2 is 1)
- According to rule 4, arr2 broadcasts along dimension 1 (dimension 1 length of arr2 is 1)
Refer to the following figure for the process:
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-pPB1A1Xr-1637239987979)(images / broadcast 1.png)]
Example 2
arr2 = np.array([ # 2-dimensional array, shape is (1,3) [10, 20, 30] ]) arr3 = np.array([ # 2-dimensional array, shape is (3,1) [10], [20], [30] ]) print(arr2.shape) print(arr3.shape)
(1, 3) (3, 1)
arr_sum = arr2 + arr3 # Broadcast occurs and the sum of arrays is calculated print(arr_sum) print(arr_sum.shape) # Output shape: Rule 1 is satisfied; According to rule 2, the output shape is (max(1,3), max(3,1)) = (3,3)
[[20 30 40] [30 40 50] [40 50 60]] (3, 3)
How to calculate each element?
- According to rule 3, two arrays can be used for calculation (both dimensions are different, but both have one dimension. The value of an array is 1)
- According to rule 4, arr2 broadcasts along the first dimension and arr3 broadcasts along the second dimension (dimension 1 of arr2 is 1 and dimension 2 of arr3 is 1)
Refer to the following figure for the process:
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-uzhPCc7H-1637239987984)(images / broadcast 2.png)]
2. Huge function library
numpy also provides a large number of functions to process arrays. Making full use of these functions can simplify the logic of the program and improve the running speed of the program.
2.1 random number function
numpy.random module provides a large number of functions related to random numbers. The following is a list of key functions:
[if the external link image transfer fails, the source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-PeEhFx8i-1637239987986)(images / random number 1.png)] [if the external link image transfer fails, the source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-l9iZDbeZ-1637239987988)(images / random number 2.png)]
Look at a few examples
print(np.random.rand(2,3)) # Randomly generate a two-dimensional array with shape (2,3), with each element between 0 and 1 print(np.random.randn(2,3)) # Randomly generate a two-dimensional array with shape (2,3), and each element conforms to the standard normal distribution (mean value is 0 and standard deviation is 1) print(np.random.randint(1,10,size=(2,3))) # Randomly generate a two-dimensional array with shape (2,3). Each element is between 1 and 10, excluding 10
[[0.71599632 0.77172853 0.28996579] [0.72852298 0.60934017 0.84835497]] [[-0.42632997 0.85967941 -1.09201344] [-0.23171307 -0.17257021 0.90615894]] [[2 3 6] [1 2 3]]
print(np.random.uniform(1,10,(2,3))) # Randomly generate a two-dimensional array with shape (2,3), and each element conforms to the uniform distribution of 1 ~ 10
[[3.06191826 6.84866151 4.98272061] [7.88468112 9.32084376 6.50330689]]
a = np.arange(10) np.random.shuffle(a) # Use the shuffle function to disrupt array a, array A is changed, and there is no return print(a)
[4 3 6 8 7 0 1 5 9 2]
a = np.arange(10) print(np.random.permutation(a)) # Use permutation function to disrupt array a, array a remains unchanged, and return a new array after disruption print(a)
[1 6 9 4 0 2 3 5 8 7] [0 1 2 3 4 5 6 7 8 9]
np.random.seed(0) # Set random seed so that the random number of each run is consistent (so that the random number can be reproduced) print(np.random.rand(2,3))
[[0.5488135 0.71518937 0.60276338] [0.54488318 0.4236548 0.64589411]]
2.2 summation, mean value and variance function
The functions described in this section are as follows:
[the external chain picture transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the picture and upload it directly (img-Ab0fNMim-1637239987989)(images / sum average variance. png)]
Look at a few examples
a = np.random.randint(0,5,(2,3)) print(a)
[[4 0 0] [4 2 1]]
print(np.sum(a, axis=0)) # Sum along dimension 0 print(np.sum(a, axis=1)) # Sum along dimension 1 print(np.sum(a, axis=0, keepdims=True)) # Reserved dimension
[8 2 1] [4 7] [[8 2 1]]
Several other functions are used in a similar way, and will not be discussed in detail here
2.3 size and sorting function
Common size and sorting functions are shown as follows:
[the external chain image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-TC2Tfeik-1637239987991)(images / size and sorting. png)]
Note the difference between min() and minimum() and between max() and maximum()
# Define two arrays a = np.random.randint(0,5,(2,3)) b = np.random.randint(0,5,(2,3)) print(a) print(b)
[[0 1 1] [0 1 4]] [[3 0 3] [0 2 3]]
print(np.min(a, axis=0)) # Minimum a along dimension 0 (single array) print(np.minimum(a, b)) # Compare two arrays and take the minimum value of each position
[0 1 1] [[0 0 1] [0 1 3]]
2.4 array operation function
Common array operation functions are as follows:
[the external chain image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-cSS8nOOh-1637239987993)(images / multidimensional array operation. png)]
Let's look at some examples
a = np.full((2,2), 1) # shape is (2,2) and element is 1 b = np.full((2,2), 2) # shape is (2,2) and element is 2 print(a) print(b)
[[1 1] [1 1]] [[2 2] [2 2]]
print(np.concatenate((a,b))) # Splice along dimension 0 print(np.concatenate((a,b), axis=1)) # Splice along dimension 1
[[1 1] [1 1] [2 2] [2 2]] [[1 1 2 2] [1 1 2 2]]
print(np.vstack((a,b))) # Splice along dimension 0 print(np.hstack((a,b))) # Splice along dimension 1
[[1 1] [1 1] [2 2] [2 2]] [[1 1 2 2] [1 1 2 2]]
2.5 product operation function
Common product operation functions are as follows:
[the external chain image transfer fails. The source station may have an anti-theft chain mechanism. It is recommended to save the image and upload it directly (img-BCezh9Jo-1637239987995)(images / various product operations. png)]
Let's look at some examples
a = np.full((3,1), 1) # shape is (3,1) and element is 1 b = np.full((1,3), 2) # shape is (1,3) and element is 2 print(a) print(b)
[[1] [1] [1]] [[2 2 2]]
print(np.dot(a,b)) # Matrix product, shape is (3,3)
[[2 2 2] [2 2 2] [2 2 2]]
print(np.dot(b,a)) # Matrix product, shape is (1,1)
[[6]]
a = np.arange(0,5) b = np.arange(0,-5,-1) print(a) print(b)
[0 1 2 3 4] [ 0 -1 -2 -3 -4]
print(np.inner(a,b)) # Inner product, return scalar
-30
As mentioned above, there are many functions that operate on arrays. Here are only five common classes. Only a small number of usage examples are given in each class. For you, you need to do the following:
- Read the function names and functions of the above functions several times to know that numpy has provided these functions. If there are relevant requirements in the future, know which functions can be used and which function to use;
- The specific usage details are gradually familiar in practice;
3. Summary
This section mainly tells you two knowledge points: ndarray object and numpy's huge function library.
In practice, Python is really slow compared with java, c and other languages. However, when calculating multidimensional arrays, if you can make good use of numpy's ndarray and its supporting functions (instead of using list and for loops), your Python program will run very efficiently.