Introduction to Python 1: convolutional neural network for MNIST handwritten digit recognition

Python is developed by the Torch 7 team. The difference between Python and Torch is that Python is used as the development language. It also shows that it is a deep learning framework with Python priority. It can not only achieve powerful GPU acceleration, but also support dynamic neural network, which is not supported by many mainstream frameworks such as TensorFlow. Besides, python can easily expand, and has the momentum to catch up with TensorFlow in the past two years

There are two types of variables in Python: Tensor and Variable. And there is a free conversion between Tensor and Numpy.

2. Convolutional neural network (CNN)

Convolutional Neural Network (CNN) is a kind of feedforward neural network. It has long been one of the core algorithms in the field of image recognition

This is a schematic diagram of roll and neural network mechanism. Let's understand the following parts:

Input Layer: for us, the input is a picture, but for the computer, the input is a matrix type data, which represents the pixel value of the picture
Convolution Layer: feature extraction is performed on the input data. Each convolution kernel extracts a feature value from a small area each time. All the feature values are combined to get a feature map. When multiple convolution checks are used for feature extraction, multiple feature maps are obtained. This convolution kernel is called (kernel)
Activation Function: a function running on the neuron of the artificial neural network, which is responsible for mapping the input of the neuron to the output. The purpose of introducing Activation Function is to increase the nonlinearity of the neural network model
Pooling Layer: compress images without affecting feature quality to reduce parameters. There are two main types of pooling: MaxPooling and AvePooling. Suppose that the pooled kernel is a 2 * 2 matrix, and MaxPooling is the maximum output value, and AvePooling is the average output value of all data
Full connected layer (FC for short): it plays the role of "Classifier" in the whole convolutional neural network. If the operations of volume layer, pool layer and activation function layer are to map the original data to the hidden layer feature space, the full connection layer is to map the learned "distributed feature representation" to the sample marker space
Output Layer: the upstream of the Output Layer in the convolutional neural network is usually the full connection layer, so its structure and working principle are the same as the Output Layer in the traditional feedforward neural network. For image classification, the Output Layer uses the logic function or the softmax function to output the classification label [. In object detection, the Output Layer can be designed as the center coordinate, size and classification of the output object. In image semantic segmentation, the Output Layer directly outputs the classification results of each pixel

After understanding the convolutional neural network, let's look at the convolutional neural network in PyTorch

3. Convolution neural network in PyTorch

3.1 convolution layer: nn.Conv2d()

Its parameters are as follows:

parameter	Meaning
in_channels	The number of channels for the input signal
out_channels	The number of output channels after convolution
kernel_size	The shape of convolution kernel. For example, kernel_size=(3, 2) represents the convolution kernel of 3X2. If the width and height are the same, it can only be represented by one number
stride	Convolution steps per move, default is 1
padding	The number of padding 0 when processing the boundary. The default is 0 (no padding)
dilation	Number of sampling intervals, default is 1, no interval sampling
groups	The number of packets for input and output channels. When it is not 1, it defaults to 1 (full connection
bias	When True, add the offset

3.2 pool layer: nn.MaxPool2d()

Its parameters are as follows:

parameter	Meaning
kernel_size	Window size at maximum pooling
stride	The step size of window movement during the maximum pooling operation. The default value is kernel_size
padding	Number of implicit zeros per edge entered
dilation	Parameters used to control the step size of elements in the window
return_indices	If it is equal to True, return the maximum index while returning the max pooling result, which is very useful in the subsequent Unpooling
ceil_mode	If it is equal to True, when calculating the output size, the up rounding will be used instead of the default down rounding

4. MNIST handwritten digit recognition

Now let's use Python to build a convolutional neural network to train and predict handwritten numbers

4.1 import and stock in function

import torch.nn as nn  # The most important module in Python encapsulates the functions related to neural network
import torch.nn.functional as F  # Some common functions are provided, such as softmax
import torch.optim as optim  # The optimization module encapsulates some optimizers for solving the model, such as Adam SGD
from torch.optim import lr_scheduler  # The learning rate adjuster can change the learning rate reasonably in the training process
from torchvision import transforms  # Some interfaces for data transformation are provided in the python visual Library
from torchvision import datasets  # The python visual library provides an interface for loading datasets

4.2 setting super parameters

# Preset network super parameters (so-called super parameters are parameters that can be set artificially

BATCH_SIZE = 64  # Due to the use of batch training method, it is necessary to define the number of samples for each batch of training

EPOCHS = 3  # Total number of training iterations

# Let torch judge whether to use GPU. It is recommended to use GPU environment, because it will be much faster
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")

learning_rate = 0.001  # Set the initial learning rate

4.3 loading data sets

Here we use the dataloader iterator to load the dataset

# Load training set
train_loader = torch.utils.data.DataLoader(
    datasets.MNIST('data', train=True, download=True,
                   transform=transforms.Compose([
                       transforms.ToTensor(),
                       transforms.Normalize(mean=(0.5,), std=(0.5,))  # Data normalization to normal distribution
                   ])),
    batch_size=BATCH_SIZE, shuffle=True)  # Indicating the batch size and disruption is the need for follow-up training.

test_loader = torch.utils.data.DataLoader(
    datasets.MNIST('data', train=False, transform=transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize((0.5,), (0.5,))
    ])),
    batch_size=BATCH_SIZE, shuffle=True)

The purpose of the above codes is to:

Download the mnist training set and test set to the folder data
Change the data, including changing to the sensor data type, and normalizing the data to the normal distribution to ensure the independent and identical distribution of the training set test set
The data is stored in batches, and the sequence is disordered for the convenience of subsequent training.

4.4 design CNN

The structure of a simple convolution neural network generally includes:

Convolution layer Conv: extract image features through convolution kernel to get feature map
Pool layer: use convolution check feature map to reduce sampling and size. The maximum pooling layer is the largest pixel in the sliding window, and the average pooling layer is the average pixel result in the sliding window.
Full connection layer: multiple linear model s + activation function eg: ReLU.
This constitutes a basic CNN, but out of a learning mentality, we'd better understand some tricks in the process of in-depth learning and training to improve the generalization ability of the model. Such as:

batch normalization: it is simply to normalize all the output results of the weighted sum of the previous layer in batches (standard normal distribution), then input a linear model, and then connect to the activation function.
DropOut: in the full connection layer, we set the probability randomly to make some weights of this layer 0, which is equivalent to the ineffectiveness of neurons.

# design a model
class ConvNet(nn.Module):
    def __init__(self):
        super(ConvNet, self).__init__()
        # Extract feature layer
        self.features = nn.Sequential(
            # Convolution layer
            # The input image channel is 1, because we use black-and-white image, single channel
            # The output channel is 32 (representing the use of 32 convolution kernels). A convolution kernel generates a single channel characteristic graph
            # The size of convolution kernel_is 3 * 3, and the number of moving pixels represented by stripe is 1
            # padding: 1 means that there are two more pixels in the length and width of the image
            nn.Conv2d(in_channels=1, out_channels=32, kernel_size=3, stride=1, padding=1),# ((28-3+2*1)/1)+1=28  28*28*1  > 28*28*32

            # For batch normalization, the out channels of the next layer are the same size, and the following channel rules must be well matched
            nn.BatchNorm2d(num_features=32),   #28*28*32  > 28*28*32

            # Activation function, inplace=true means direct operation
            nn.ReLU(inplace=True),
            nn.Conv2d(32, 32, kernel_size=3, stride=1, padding=1),   #  ((28-3+2*1)/1)+1=28     28*28*32 > 28*28*32
            nn.BatchNorm2d(32),
            nn.ReLU(inplace=True),

            # Maximum pool layer
            # A sliding window with a kernel size of 2 * 2
            # A stripe of 2 indicates that each sliding distance is 2 pixels
            # After this step, the size of the image becomes 1 / 4, that is, 28 * 28 - '7 * 7
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Conv2d(32, 64, kernel_size=3, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True),
            nn.Conv2d(64, 64, kernel_size=3, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2)
        )
        # Taxonomy
        self.classifier = nn.Sequential(
            # Dropout level
            # p = 0.5 means that there is a possibility of 0.5 for each weight of the layer
            nn.Dropout(p=0.5),
            # Here is the number of channels 64 * image size 7 * 7, and then input to 512 neurons
            nn.Linear(64 * 7 * 7, 512),
            nn.BatchNorm1d(512),
            nn.ReLU(inplace=True),
            nn.Dropout(p=0.5),
            nn.Linear(512, 512),
            nn.BatchNorm1d(512),
            nn.ReLU(inplace=True),
            nn.Dropout(p=0.5),
            nn.Linear(512, 10),
        )
    # Forward transfer function    
    def forward(self, x):
        # Through feature extraction layer
        x = self.features(x)
        # The output must be flattened to a one-dimensional vector
        x = x.view(x.size(0), -1)
        x = self.classifier(x)
        return x

4.5 preparation before training

# Initialization model moves network operation to GPU or CPU

ConvModel = ConvNet().to(DEVICE)
# Define cross entropy loss function
criterion = nn.CrossEntropyLoss().to(DEVICE)
# Define model optimizer: input model parameters and define initial learning rate
optimizer = torch.optim.Adam(ConvModel.parameters(), lr=learning_rate)
# Define learning rate scheduler: input the model of packing, define learning rate decay period step_size, gamma is the multiplication factor of decay
exp_lr_scheduler = lr_scheduler.StepLR(optimizer, step_size=6, gamma=0.1)
# Explanation on the official website. If the initial learning rate lr = 0.05, the decay period step_size is 30, and the decay multiplication factor gamma=0.01
# Assuming optimizer uses lr = 0.05 for all groups
# >>> # lr = 0.05     if epoch < 30
# >>> # lr = 0.005    if 30 <= epoch < 60
# >>> # lr = 0.0005   if 60 <= epoch < 90

4.6 training

def train(num_epochs, _model, _device, _train_loader, _optimizer, _lr_scheduler):
    _model.train()  # Set the model to training mode
    _lr_scheduler.step() # Set learning rate scheduler to prepare for update
    for epoch in range(num_epochs):
        # Extract images and tags from iterators
        for i, (images, labels) in enumerate(_train_loader):
            samples = images.to(_device)
            labels = labels.to(_device)
            # At this time, the sample is a batch of pictures. In CNN's input, we need to change it into four dimensions,
            # reshape the first - 1 represents the number of automatically calculated batch pictures n
            # The final result of reshape is n pictures, each picture is 28 * 28 of a single channel, and four-dimensional tensor is obtained
            output = _model(samples.reshape(-1, 1, 28, 28))
            # Calculate loss function value
            loss = criterion(output, labels)
            # Optimizer internal parameter gradient must be 0
            optimizer.zero_grad()
            # Back propagation of loss value
            loss.backward()
            # Update model parameters
            optimizer.step()
            if (i + 1) % 100 == 0:
                print("Epoch:{}/{}, step:{}, loss:{:.4f}".format(epoch + 1, num_epochs, i + 1, loss.item()))

4.7 forecast

def test(_test_loader, _model, _device):
    _model.eval()  # Set the model to enter the prediction mode evaluation
    loss = 0
    correct = 0

    with torch.no_grad():  # If you don't need backward to update the gradient, you need to disable gradient calculation to reduce the waste of memory and computing resources.
        for data, target in _test_loader:
            data, target = data.to(_device), target.to(_device)
            output = ConvModel(data.reshape(-1, 1, 28, 28))
            loss += criterion(output, target).item()  # Add loss value
            pred = output.data.max(1, keepdim=True)[1]  # Find the subscript with the highest probability, which is the output value
            correct += pred.eq(target.data.view_as(pred)).cpu().sum()  # . cpu() is to migrate parameters to the cpu.

    loss /= len(_test_loader.dataset)

    print('\nAverage loss: {:.4f}, Accuracy: {}/{} ({:.3f}%)\n'.format(
        loss, correct, len(_test_loader.dataset),
        100. * correct / len(_test_loader.dataset)))

4.8 operation

for epoch in range(1, EPOCHS + 1):
    train(epoch, ConvModel, DEVICE, train_loader, optimizer, exp_lr_scheduler)
    test(test_loader,ConvModel, DEVICE)
    test(train_loader,ConvModel, DEVICE)

5 Results

pytorch =0.4 run with CPU

Epoch:1/1, step:100, loss:0.1579
Epoch:1/1, step:200, loss:0.0809
Epoch:1/1, step:300, loss:0.0673
Epoch:1/1, step:400, loss:0.1391
Epoch:1/1, step:500, loss:0.0323
Epoch:1/1, step:600, loss:0.0870
Epoch:1/1, step:700, loss:0.0441
Epoch:1/1, step:800, loss:0.0705
Epoch:1/1, step:900, loss:0.0396

Average loss: 0.0006, Accuracy: 9881/10000 (98.000%)

Epoch:1/2, step:100, loss:0.0487
Epoch:1/2, step:200, loss:0.1519
Epoch:1/2, step:300, loss:0.0262
Epoch:1/2, step:400, loss:0.2133
Epoch:1/2, step:500, loss:0.0161
Epoch:1/2, step:600, loss:0.0805
Epoch:1/2, step:700, loss:0.0927
Epoch:1/2, step:800, loss:0.0663
Epoch:1/2, step:900, loss:0.0669
Epoch:2/2, step:100, loss:0.0124
Epoch:2/2, step:200, loss:0.0527
Epoch:2/2, step:300, loss:0.0256
Epoch:2/2, step:400, loss:0.0138
Epoch:2/2, step:500, loss:0.0894
Epoch:2/2, step:600, loss:0.1030
Epoch:2/2, step:700, loss:0.0528
Epoch:2/2, step:800, loss:0.1106
Epoch:2/2, step:900, loss:0.0044

Average loss: 0.0003, Accuracy: 9930/10000 (99.000%)

Epidemic at home, a small project a day come on! Tomorrow we will use visdom to visualize the network!

Reference resources:

https://blog.csdn.net/out_of_memory_error/article/details/81434907

https://blog.csdn.net/think_three/article/details/79949710

https://blog.csdn.net/CVSvsvsvsvs/article/details/88564895