Fundamentals of machine learning: the use of numpy

1, Numpy advantage

1. Introduction to ndarray

NumPy provides an N-dimensional array type ndarray, which describes a collection of "items" of the same type.

Store with ndarray:

import numpy as np

# Create ndarray
score = np.array(
[[80, 89, 86, 67, 79],
[78, 97, 89, 67, 81],
[90, 94, 78, 67, 74],
[91, 91, 90, 67, 69],
[76, 87, 75, 67, 86],
[70, 79, 84, 67, 84],
[94, 92, 93, 67, 64],
[86, 85, 83, 67, 80]])

score

Return result:

array([[80, 89, 86, 67, 79],
       [78, 97, 89, 67, 81],
       [90, 94, 78, 67, 74],
       [91, 91, 90, 67, 69],
       [76, 87, 75, 67, 86],
       [70, 79, 84, 67, 84],
       [94, 92, 93, 67, 64],
       [86, 85, 83, 67, 80]])

Question: one dimensional arrays can be stored using Python lists, and multi-dimensional arrays can be realized through the nesting of lists. Why do you need to use Numpy's ndarray?

2. Comparison of operation efficiency between ndarray and Python native list

Here we realize the benefits of ndarray by running a piece of code

import random
import time
import numpy as np
a = []
for i in range(100000000):
    a.append(random.random())

# Through the% time magic method, view the time taken for the current line of code to run once
%time sum1=sum(a)

b=np.array(a)

%time sum2=np.sum(b)

The first time shows the time calculated using native Python, and the second content uses numpy to calculate the time:

CPU times: user 852 ms, sys: 262 ms, total: 1.11 s
Wall time: 1.13 s
CPU times: user 133 ms, sys: 653 µs, total: 133 ms
Wall time: 134 ms

From this, we can see that the calculation speed of ndarray is much faster and saves time.

The biggest feature of machine learning is a large number of data operations. Without a fast solution, python may not achieve good results in the field of machine learning.

Numpy is specially designed for ndarray operations and operations, so the storage efficiency and input-output performance of arrays are much better than those of nested lists in Python. The larger the array, the more obvious the advantages of numpy.

Thinking: why can ndarray be so fast?

3. Advantages of ndarray

3.1 memory block style

How is ndarray different from the native python list? Please see a figure:

We can see from the figure that when ndarray stores data, the data and data addresses are continuous, which makes the batch operation of array elements faster.

This is because the types of all elements in the ndarray are the same, and the element types in the python list are arbitrary. Therefore, the memory of the ndarray can be continuous when storing elements, while the python native list can only find the next element through addressing. Although this also leads to the fact that the ndarray of numpy is inferior to the python native list in terms of general performance, in scientific calculation, Numpy's ndarray can eliminate many circular statements, and the code is much simpler than Python's native list.

3.2 ndarray supports parallelization (vectorization)

Numpy has built-in parallel computing function. When the system has multiple cores, numpy will automatically perform parallel computing when doing some computing

3.3 is much more efficient than pure Python code

The bottom layer of Numpy is written in C language, and the GIL (global interpreter lock) is released internally. Its operation speed on the array is not limited by the Python interpreter. Therefore, its efficiency is much higher than that of pure Python code.

2, N-dimensional array - ndarray

1. Properties of ndarray

Array properties reflect the information inherent in the array itself.

Attribute name	Attribute interpretation
ndarray.shape	Tuple of array dimension
ndarray.ndim	Array dimension
ndarray.size	Number of elements in the array
ndarray.itemsize	Length of an array element (bytes)
ndarray.dtype	Type of array element

2. Shape of ndarray

First create some arrays.

# Create arrays of different shapes
>>> a = np.array([[1,2,3],[4,5,6]])
>>> b = np.array([1,2,3,4])
>>> c = np.array([[[1,2,3],[4,5,6]],[[1,2,3],[4,5,6]]])

Print out shapes separately

>>> a.shape
>>> b.shape
>>> c.shape

(2, 3)  # Two dimensional array
(4,)    # One dimensional array
(2, 2, 3) # 3D array

How to understand the shape of an array?

2D array:

3D array:

3. Type of ndarray

>>> type(score.dtype)

<type 'numpy.dtype'>

Dtype is numpy.dtype. Let's see what types are available for arrays

name	describe	Abbreviation
np.bool	Boolean type (True or False) stored in one byte	'b'
np.int8	One byte size, - 128 to 127	'i'
np.int16	Integer, - 32768 to 32767	'i2'
np.int32	Integer, - 2 ^ 31 to 2 ^ 32 - 1	'i4'
np.int64	Integer, - 2 ^ 63 to 2 ^ 63 - 1	'i8'
np.uint8	Unsigned integer, 0 to 255	'u'
np.uint16	Unsigned integer, 0 to 65535	'u2'
np.uint32	Unsigned integer, 0 to 2 ^ 32 - 1	'u4'
np.uint64	Unsigned integer, 0 to 2 ^ 64 - 1	'u8'
np.float16	Semi precision floating point number: 16 bits, sign 1 bit, index 5 bits, precision 10 bits	'f2'
np.float32	Single precision floating point number: 32 bits, sign 1 bit, exponent 8 bits, precision 23 bits	'f4'
np.float64	Double precision floating point number: 64 bits, sign 1 bit, index 11 bits, precision 52 bits	'f8'
np.complex64	Complex number, which represents the real part and imaginary part with two 32-bit floating-point numbers respectively	'c8'
np.complex128	Complex numbers, representing the real part and imaginary part with two 64 bit floating-point numbers respectively	'c16'
np.object_	python object	'O'
np.string_	character string	'S'
np.unicode_	unicode type	'U'

Specify the type when creating an array

>>> a = np.array([[1, 2, 3],[4, 5, 6]], dtype=np.float32)
>>> a.dtype
dtype('float32')

>>> arr = np.array(['python', 'tensorflow', 'scikit-learn', 'numpy'], dtype = np.string_)
>>> arr
array([b'python', b'tensorflow', b'scikit-learn', b'numpy'], dtype='|S12')

Note: if not specified, integer defaults to int64 and decimal defaults to float64

Programmer Group