[deep learning] forward propagation with complete python code

Keywords: Python Machine Learning neural networks TensorFlow Deep Learning

Hello, everyone. Today, I'd like to share with you the derivation process of forward propagation in tensorflow 2.0 deep learning, using the mnist data set provided by the system.

1. Data acquisition

First, we import the required library files and datasets. The imported x and y data are array types and need to be converted to tensor type tf.convert_to_tensor(), and then check whether there is any problem with the data we read.

import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import datasets # Dataset tools
import os  # Set what the output box prints
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2' # The '2' output column only prints error information, and other messy information is not printed

#(1) Get mnist dataset
(x,y),_ = datasets.mnist.load_data() 
#(2) The x data type is generally float32, and y stores the specific type of the picture, which is an integer
x = tf.convert_to_tensor(x,dtype=tf.float32)
y = tf.convert_to_tensor(y,dtype=tf.int32)
#(3) View data content
print('shape: ',x.shape,y.shape,'\ndtype: ',x.dtype,y.dtype)  #Viewing shape s and data types
print('x Minimum value of:',tf.reduce_min(x),'\nx Maximum value of:',tf.reduce_max(x))  #View x's data range
print('y Minimum value of:',tf.reduce_min(y),'\ny Maximum value of:',tf.reduce_max(y))  #View y's data range
# The printed results are as follows
shape:  (60000, 28, 28) (60000,) 
dtype:  <dtype: 'float32'> <dtype: 'int32'>
x Minimum value of: tf.Tensor(0.0, shape=(), dtype=float32) 
x Maximum value of: tf.Tensor(255.0, shape=(), dtype=float32)
y Minimum value of: tf.Tensor(0, shape=(), dtype=int32) 
y Maximum value of: tf.Tensor(9, shape=(), dtype=int32)

2. Data preprocessing

Firstly, the x data is normalized. The original pixel value of x is between [0255], but now it is between [0,1]. The shape of the newly imported y data is [6000], one-dimensional, storing the number of classifications. In order to compare with the final prediction results, it is one-hot coded, and the shape becomes [6000,10]. Store the probability that each graph belongs to each classification. y.numpy()[0] indicates that the probability of the 0th image belonging to the 5th classification is 1, and the probability of belonging to other classifications is 0. Another learning rate lr is set to update the neural network weight parameters after each iteration. The initial learning rate is 0.01 ~ 0.001.

#(4) Pretreatment
x = x/255.  # Normalize the range of x data from [0255] to [0,1]
y = tf.one_hot(y,depth=10) # y is the classification value, which is one hot coded, and the shape becomes [b,10]
y.numpy()[0]  # View the data of y after encoding
# array([0., 0., 0., 0., 0., 1., 0., 0., 0., 0.], dtype=float32)

lr = 1e-3 # If the learning rate is set too large, it will lead to loss shock and learning is difficult to converge; If the setting is too small, the training process will increase greatly

Load dataset tf.data.Dataset.from_tensor_slices(), generating the of the iterator   iter(), returns the next item of the iterator next()  

#(5) Specify a batch to select data, and take 128 data at a time
train_db = tf.data.Dataset.from_tensor_slices((x,y)).batch(128)
train_iter = iter(train_db) # Specify iterator
sample = next(train_iter) # Each batch stored
# sample[0] stores x data and sample[1] stores y data. Each sample has 128 groups of pictures
# Print result: batch: (128, 28, 28) (128, 10)

3. Build a network

The shape of input feature x is [128,28,28], that is, the input layer has 28 * 28 neurons, the user-defined hidden layer 1 has 256 neurons, and the hidden layer 2 has 128 neurons. The final output result is 10 fixed classifications.

Determine the shape of each connection layer according to the number of neurons in each layer, initialize each weight parameter with random truncated Gaussian distribution, and change the defined variable from tensor type to neural network type variable type.

#(6) Build network
# The input layer is determined by the number of input feature points, and the output layer is determined by the number of classifications
# Input layer shape[b,784], output layer shape[b,10]
# Build a network and customize the number of neurons in the middle layer
# [b,784] => [b,256] => [b,128] => [b,10]

# The weight and offset of the first connection layer become tf.Variable type, so that tf.GradientTape can record gradient information
w1 = tf.Variable(tf.random.truncated_normal([784,256], stddev=0.1)) # Truncate the normal distribution and reduce the standard deviation to prevent gradient explosion
b1 = tf.Variable(tf.zeros([256])) #Dimension is [dim_out]
# Weight and offset of the second connection layer
w2 = tf.Variable(tf.random.truncated_normal([256,128], stddev=0.1)) # Truncated normal distribution, dimension is [dim_in, dim_out]
b2 = tf.Variable(tf.zeros([128])) #Dimension is [dim_out]
# Weight and offset of the third connection layer
w3 = tf.Variable(tf.random.truncated_normal([128,10], stddev=0.1)) # Truncated normal distribution, dimension is [dim_in, dim_out]
b3 = tf.Variable(tf.zeros([10])) #Dimension is [dim_out]

4. Forward propagation operation

Each iteration starts from train_ 128 sample data are extracted from DB. Since the shape of the extracted x data is [128,28,28], its shape needs to be transformed into [128,28 * 28] before it can be passed into the input layer tf.reshape(). h = x @ w + b, the eigenvector and weight of this layer are the inner product, plus bias, and the calculation result is put into the activation function tf.nn.relu(), so as to obtain the input eigenvector of the next layer. The final output result out stores the probability that each picture belongs to each category.

#(7) Forward propagation operation
for i in range(10):  #Iterate the entire dataset 10 times
    # Iterate over all batch es of the dataset once
    # x is the input feature item, shape is [128,28,28], y is the classification result, and shape is [128,10]
    for step,(x,y) in enumerate(train_db): # Returns the subscript and the corresponding value
        # Here, the shape of x is [b,28*28], from [b,w,h] to [b,w*h]
        x = tf.reshape(x,[-1,28*28]) #For the dimension transformation of the input feature item, - 1 will automatically calculate b
        with tf.GradientTape() as tape: # Automatic derivation calculates the gradient, and only tf.Variable type data will be tracked
            # ==1 = = calculation method from input layer to hidden layer 1: h1 = w1 @ x1 + b1   
            # [b,784] @ [784,256] + [b,256] = [b,256]
            h1 = x @ w1 + b1  # When adding, it will broadcast automatically, change the shape of B and tf.broadcast automatically_ to(b1,[x.shape[0],256])
            # Activate function
            h1 = tf.nn.relu(h1)
            # ==2 = = from hidden layer 1 to hidden layer 2, [b, 256] @ [256128] + [b, 128] = [b, 128]
            h2 = h1 @ w2 + b2
            h2 = tf.nn.relu(h2)
            # ==3 = = from hidden layer 2 to output layer, [b,128] @ [128,10] + [b,10] = [b,10]
            out = h2 @ w3 + b3 # shape is [b,10]
            #(8) Calculation error: the shape of the output value out is [b,10], and the shape of the real value y after onehot coding is [b,10]
            # Calculate mean square deviation mse = mean(sum((y-out)^2)
            loss_square = tf.square(y-out)  # shape is [b,10]
            loss = tf.reduce_mean(loss_square) # Get a scalar
        # Gradient calculation
        grads = tape.gradient(loss,[w1,b1,w2,b2,w3,b3])
        # Note: in the following method, the return value of the operation is tf.tensor type, and an error will occur in the next operation
        # w1 = w1 - lr * grads[0] # grads[0] value w1 returned from gradient calculation is the 0th element of grad

        # Weight update, lr is the learning rate, and how much does the gradient decrease each time
        # Therefore, you need to update the function in place to ensure that the updated data type remains unchanged tf.Variable
        w1.assign_sub(lr * grads[0])
        b1.assign_sub(lr * grads[1])    
        w2.assign_sub(lr * grads[2])  
        b2.assign_sub(lr * grads[3]) 
        w3.assign_sub(lr * grads[4])    
        b3.assign_sub(lr * grads[5]) 
        if step % 100 == 0: #Display data every 100 times
            print(f"The first{step+1}Iterations, loss by{np.float(loss)}") #loss is the tensor variable

Calculate the mean square deviation between the output result and the real result as the model loss. Use the gradient calculation method tape.gradient() in tf.GradientTape() to update the weight parameters of the next iteration. The formula is w1 = w1 - lr * grads[n], but because the formula returns a variable of tensor type, tf.GradientTape() The gradient calculation method can only track and calculate the data of tf.variable type. Therefore, you need to use the assign_sub() function to update the weight parameters in situ without changing the variable type.

# In the last cycle, the output result of loss is:
Iteration 1, loss Is 0.08258605003356934
 Iteration 101, loss Is 0.09005936980247498
 Iteration 201, loss Is 0.0828738585114479
 Iteration 301, loss Is 0.0822446346282959
 Iteration 401, loss Is 0.08802710473537445

Posted by djw821 on Sun, 05 Dec 2021 01:44:16 -0800