Recognition of fashion MNIST data set by convolutional neural network (DenseNet) (pytoch version)

Keywords: neural networks Pytorch Deep Learning

1. Preface

1.1 case introduction

In this case, pytoch is used to build a DenseNet network structure for image classification of fashion MNIST dataset. The analysis of this problem can be divided into data preparation, model establishment, training with training set and testing the effect of model with test set.

1.2 environment configuration

(1) operating system: Windows10
(2) compiler environment: PyCharm Community Edition 2021.2
(3) configuration environment: Pytorch1.7.1 + torchvision8.2 + CUDA11.3

1.3 module import

This case needs to import the following library files and related modules:

import numpy as np
import pandas as pd
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
import matplotlib.pyplot as plt
import seaborn as sns
import copy
import time
import torch
import torch.nn as nn
from torch.optim import Adam
import torch.utils.data as Data
from torchvision import transforms
from torchvision.datasets import FashionMNIST

2. Image data preparation

Before model establishment and training, first prepare the FashionMNIST data set, which can be directly read by using the FashionMNIST() function of datasets module in torchvision library. If there is no current data in the specified working folder, the data can be automatically downloaded from the network.

2.1 preparation of training verification set

The loading handler of the training validation set is packaged as the following train_data_process() function, which is used to import training data sets, and then use Data.DataLoader() function to define them as data loaders. Each batch will contain 64 samples. Through len() function, you can calculate the number of batches contained in the data loader and output the display train_ The loader contains 938 batches. It should be noted that the parameter shuffle = False indicates that the samples used by each batch in the loader are fixed, which is conducive to dividing the model into training set and verification set according to the number of iterations. At the same time, in order to observe the content of each image in the dataset, a batch image can be obtained and visualized to observe the data.

# Processing training set data
def train_data_process():
    # Load the FashionMNIST dataset
    train_data = FashionMNIST(root="./data/FashionMNIST",  # Data path
                              train=True,  # Use only training datasets
                              transform=transforms.Compose([transforms.Resize(size=96), transforms.ToTensor()]),  # Change the PIL.Image or numpy.array data type to torch.FloatTensor type
                                                                                                                   # The size is Channel * Height * Width, and the value range is reduced to [0.0, 1.0]
                              download=False,  # If the corresponding dataset is not downloaded, select True
                              )
    train_loader = Data.DataLoader(dataset=train_data,  # Incoming dataset
                                   batch_size=64,  # Number of samples per Batch
                                   shuffle=False,  # Do not reorder the dataset
                                   num_workers=0,  # Number of processes started to load data
                                   )
    print("The number of batch in train_loader:", len(train_loader))  # There are 938 batches in total, and each batch contains 64 training samples

    # Get the data of a Batch
    for step, (b_x, b_y) in enumerate(train_loader):
        if step > 0:
            break
    batch_x = b_x.squeeze().numpy()  # Remove the first dimension of the four-dimensional tensor and convert it into a Numpy array
    batch_y = b_y.numpy()  # Convert tensor to Numpy array
    class_label = train_data.classes  # Label of training set
    class_label[0] = "T-shirt"
    print("the size of batch in train data:", batch_x.shape)
    
    # Visualize an image of a Batch
    plt.figure(figsize=(12, 5))
    for ii in np.arange(len(batch_y)):
        plt.subplot(4, 16, ii+1)
        plt.imshow(batch_x[ii, :, :], cmap=plt.cm.gray)
        plt.title(class_label[batch_y[ii]], size=9)
        plt.axis("off")
        plt.subplots_adjust(wspace=0.05)
    plt.show()
    
    return train_loader, class_label

The obtained visual images are as follows:

Note: since the input size of DenseNet model is 96, here we expand the size of fashion MNIST dataset to 96, and the size of each batch is 64, so the size of each mini batch is 64 × ninety-six × 96.

2.2 preparation of test set

The load handler for the test set is packaged as the following test_ data_ The process () function is used to import the test data set, expand its size to 96, and process all samples as a whole as a batch for testing..

# Processing test set data
def test_data_process():
    test_data = FashionMNIST(root="./data/FashionMNIST",  # Data path
                             train=False,  # Do not use training dataset
                             transform=transforms.Compose([transforms.Resize(size=96), transforms.ToTensor()]),  # Change the PIL.Image or numpy.array data type to torch.FloatTensor type
                                                                                                                  # The size is Channel * Height * Width, and the value range is reduced to [0.0, 1.0]
                             download=False,  # If the previous data has been downloaded, there is no need to download it again
                             )
    test_loader = Data.DataLoader(dataset=test_data,  # Incoming dataset
                                  batch_size=1,  # Number of samples per Batch
                                  shuffle=True,  # Do not reorder the dataset
                                  num_workers=0,  # Number of processes started to load data
                                   )

    # Get the data of a Batch
    for step, (b_x, b_y) in enumerate(test_loader):
        if step > 0:
            break
    batch_x = b_x.squeeze().numpy()  # Remove the first dimension of the four-dimensional tensor and convert it into a Numpy array
    batch_y = b_y.numpy()  # Convert tensor to Numpy array
    print("The size of batch in test data:", batch_x.shape)

    return test_loader

3. Construction of convolutional neural network

3.1 creation of dense blocks

The main difference from ResNet is that the output of module B in DenseNet is not added to the output of module A like ResNet (as shown in the left part in the figure below), but connected in the channel dimension (as shown in the right part in the figure below), so that the output of module A can be directly transmitted to the layer behind module B. In this design, module A is directly connected with all layers behind module B, which is why it is called dense connection.

DenseNet uses the improved "batch normalization, activation and convolution" structure of ResNet. First, we_ This structure is implemented in the block () function.

# Define a convolution block
def conv_block(in_channels, out_channels):
    blk = nn.Sequential(nn.BatchNorm2d(in_channels),
                        nn.ReLU(),
                        nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1))

    return blk

Dense blocks consist of multiple convs_ Each block uses the same number of output channels. However, in forward calculation, we connect the input and output of each block in the channel dimension.

# Define a dense block
class DenseBlock(nn.Module):
    def __init__(self, num_convs, in_channels, out_channels):
        super(DenseBlock, self).__init__()
        net = []
        for i in range(num_convs):
            in_c = in_channels + i * out_channels
            net.append(conv_block(in_c, out_channels))
        self.net = nn.ModuleList(net)
        self.out_channels = in_channels + num_convs * out_channels  # Calculate the number of output channels

    def forward(self, X):
        for blk in self.net:
            Y = blk(X)
            X = torch.cat((X, Y), dim=1)  # Link input and output in channel dimension

        return X

3.2 creation of transition layer

Because each dense block will increase the number of channels, and excessive use will lead to too complex models, a transition layer is used to control the complexity of the model. It passes 1 × 1 convolution layer to reduce the number of channels, and use the average pool layer with step 2 to halve the height and width, so as to further reduce the complexity of the model.

# Define a transition layer
def transition_block(in_channels, out_channels):
    blk = nn.Sequential(nn.BatchNorm2d(in_channels),
                        nn.ReLU(),
                        nn.Conv2d(in_channels, out_channels, kernel_size=1),
                        nn.AvgPool2d(kernel_size=2, stride=2)
                        )

    return blk

3.3 establishment of densenet network

DenseNet first uses the same single volume layer and maximum pooling layer as ResNet. Next, DenseNet uses four dense blocks. Like ResNet, we can set how many convolution layers each dense block uses. This is set to 4, which is consistent with ResNet-18 in the previous article. The number of convolution layer channels (i.e. growth rate) in dense blocks is set to 32, so 128 channels will be added to each dense block. Use a transition layer between each dense block to halve the height and width and halve the number of channels. Finally, connect the global average pooling layer and the full connection layer to output.

# Define a global average pooling layer
class GlobalAvgPool2d(nn.Module):
    def __init__(self):
        super(GlobalAvgPool2d, self).__init__()

    def forward(self, x):
        return nn.functional.avg_pool2d(x, kernel_size=x.size()[2:])  # The pooled window shape is equal to the shape of the input image


# Define DenseNet network structure
def DenseNet(num_channels, growth_rate, num_convs_in_dense_blocks):
    net = nn.Sequential(nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3),
                        nn.BatchNorm2d(64),
                        nn.ReLU(),
                        nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
                        )

    for i, num_convs in enumerate(num_convs_in_dense_blocks):
        DB = DenseBlock(num_convs, num_channels, growth_rate)
        net.add_module("DenseBlosk_%d" % i, DB)
        num_channels = DB.out_channels  # Number of output channels of the last dense block
        # A transition layer with half the number of channels is added between dense blocks
        if i != len(num_convs_in_dense_blocks) - 1:
            net.add_module("transition_block_%d" % i, transition_block(num_channels, num_channels // 2))
            num_channels = num_channels // 2

    net.add_module("BN", nn.BatchNorm2d(num_channels))
    net.add_module("relu", nn.ReLU())
    net.add_module("global_avg_pool", GlobalAvgPool2d())  # Output of GlobalAvgPool2d: (Batch, num_channels, 1, 1)
    net.add_module("fc", nn.Sequential(nn.Flatten(), nn.Linear(num_channels, 10)))

    return net

4. Convolutional neural network training and prediction

In order to train the network structure DenseNet, a train is defined_ Model() function, which is used to train DenseNet network with training data set. The training data set contains 60000 images and is divided into 938 batch es, of which 80% are used for model training and 20% for model verification_ The model () function includes two processes: model training and model verification.

# Define the training process of the network
def train_model(model, traindataloader, train_rate, criterion, device, optimizer, num_epochs=25):
    '''
    :param model: network model 
    :param traindataloader: The training data set is divided into training set and verification set
    :param train_rate: Training set batch_size Percentage of
    :param criterion: loss function 
    :param device: Operating equipment
    :param optimizer: optimization method 
    :param num_epochs: Number of rounds of training
    '''

    batch_num = len(traindataloader)  # batch quantity
    train_batch_num = round(batch_num * train_rate)  # 80% of the batch is used for training, and the round() function is rounded
    best_model_wts = copy.deepcopy(model.state_dict())  # Copy parameters of the current model
    # Initialization parameters
    best_acc = 0.0  # Highest accuracy
    train_loss_all = []  # Training set loss function list
    train_acc_all = []  # Training set accuracy list
    val_loss_all = []  # Validation set loss function list
    val_acc_all = []  # Validation set accuracy list
    since = time.time()  # current time 
    # Iterative training model
    for epoch in range(num_epochs):
        print('Epoch {}/{}'.format(epoch, num_epochs - 1))
        print('-' * 10)

        # Initialization parameters
        train_loss = 0.0  # Training set loss function
        train_corrects = 0  # Training set accuracy
        train_num = 0  # Number of training set samples
        val_loss = 0.0  # Verification set loss function
        val_corrects = 0  # Verification set accuracy
        val_num = 0  # Number of validation set samples
        # Train and calculate each mini batch
        for step, (b_x, b_y) in enumerate(traindataloader):
            b_x = b_x.to(device)
            b_y = b_y.to(device)
            if step < train_batch_num:  # 80% of the data set was used for training
                model.train()  # Set the model to training mode and enable Batch Normalization and Dropout
                output = model(b_x)  # In the forward propagation process, the input is a batch and the output is the corresponding prediction in a batch
                pre_lab = torch.argmax(output, 1)  # Find the row mark corresponding to the maximum value in each row
                loss = criterion(output, b_y)  # Calculate the loss function of each batch
                optimizer.zero_grad()  # Initialize gradient to 0
                loss.backward()  # Back propagation calculation
                optimizer.step()  # The network parameters are updated according to the gradient information of network back propagation to reduce the calculated value of loss function
                train_loss += loss.item() * b_x.size(0)  # Accumulate the loss function
                train_corrects += torch.sum(pre_lab == b_y.data)  # If the prediction is correct, the accuracy train_ Correct plus 1
                train_num += b_x.size(0)  # Number of samples currently used for training
            else:  # Use 20% of the dataset for validation
                model.eval()  # Set the model to evaluation mode and do not enable Batch Normalization and Dropout
                output = model(b_x)  # In the forward propagation process, the input is a batch and the output is the corresponding prediction in a batch
                pre_lab = torch.argmax(output, 1)  # Find the row mark corresponding to the maximum value in each row
                loss = criterion(output, b_y)  # Calculate the average loss function of 64 samples in each batch
                val_loss += loss.item() * b_x.size(0)  # Accumulate the loss function of each batch in the validation set
                val_corrects += torch.sum(pre_lab == b_y.data)  # If the prediction is correct, the accuracy val_ Correct plus 1
                val_num += b_x.size(0)  # Number of samples currently used for validation

        # Calculate and save the cost function and accuracy of each iteration
        train_loss_all.append(train_loss / train_num)  # Calculate and save the cost function of the training set
        train_acc_all.append(train_corrects.double().item() / train_num)  # Calculate and save the accuracy of the training set
        val_loss_all.append(val_loss / val_num)  # Calculate and save the cost function of the validation set
        val_acc_all.append(val_corrects.double().item() / val_num)  # Calculate and save the accuracy of the validation set
        print('{} Train Loss: {:.4f} Train Acc: {:.4f}'.format(epoch, train_loss_all[-1], train_acc_all[-1]))
        print('{} Val Loss: {:.4f} Val Acc: {:.4f}'.format(epoch, val_loss_all[-1], val_acc_all[-1]))

        # Find the highest accuracy
        if val_acc_all[-1] > best_acc:
            best_acc = val_acc_all[-1]  # Save current maximum accuracy
            best_model_wts = copy.deepcopy(model.state_dict())  # Save the model parameters at the current highest accuracy
        time_use = time.time() - since  # Computing takes time
        print("Train and val complete in {:.0f}m {:.0f}s".format(time_use // 60, time_use % 60))

    # Select the optimal parameters
    model.load_state_dict(best_model_wts)  # Load the model parameters at the highest accuracy
    train_process = pd.DataFrame(data={"epoch": range(num_epochs),
                                       "train_loss_all": train_loss_all,
                                       "val_loss_all": val_loss_all,
                                       "train_acc_all": train_acc_all,
                                       "val_acc_all": val_acc_all}
                                 )  # Save the loss function and accuracy of each generation in DataFrame format

    # The loss function and accuracy of the training set and verification set after each iteration are displayed
    plt.figure(figsize=(12, 4))
    plt.subplot(1, 2, 1)
    plt.plot(train_process['epoch'], train_process.train_loss_all, "ro-", label="Train loss")
    plt.plot(train_process['epoch'], train_process.val_loss_all, "bs-", label="Val loss")
    plt.legend()
    plt.xlabel("epoch")
    plt.ylabel("Loss")
    plt.subplot(1, 2, 2)
    plt.plot(train_process['epoch'], train_process.train_acc_all, "ro-", label="Train acc")
    plt.plot(train_process['epoch'], train_process.val_acc_all, "bs-", label="Val acc")
    plt.xlabel("epoch")
    plt.ylabel("acc")
    plt.legend()
    plt.show()

    return model, train_process

Next, define a test_model() function, which is used to test on the optimal model using the test set, so as to verify the performance of the model.

# test model 
def test_model(model, testdataloader, device):
    '''
    :param model: network model 
    :param testdataloader: Test data set
    :param device: Operating equipment
    '''

	# Initialization parameters
    test_corrects = 0.0
    test_num = 0
    test_acc = 0.0
    # Only forward propagation calculation is performed without gradient calculation, so as to save memory and speed up operation
    with torch.no_grad():
        for test_data_x, test_data_y in testdataloader:
            test_data_x = test_data_x.to(device)
            test_data_y = test_data_y.to(device)
            model.eval()  # Set the model to evaluation mode and do not enable Batch Normalization and Dropout
            output = model(test_data_x)  # In the forward propagation process, the input is the test data set and the output is the prediction of each sample
            pre_lab = torch.argmax(output, 1)  # Find the row mark corresponding to the maximum value in each row
            test_corrects += torch.sum(pre_lab == test_data_y.data)  # If the prediction is correct, the accuracy val_ Correct plus 1
            test_num += test_data_x.size(0)  # Number of samples currently used for training

    test_acc = test_corrects.double().item() / test_num  # Calculate the classification accuracy on the test set
    print("test accuracy:", test_acc)

Finally, the model is trained and tested. The optimization algorithm uses Adam optimizer, the learning rate is set to 0.001, and the loss function is the cross entropy function. Then call train_. The model() function trains the training set train_ 80% of the loader is used for training and 20% for verification, with a total of 25 rounds of training.

# Model training and testing
def train_model_process(myconvnet):
    optimizer = torch.optim.Adam(myconvnet.parameters(), lr=0.001)  # Using Adam optimizer, the learning rate is 0.001
    criterion = nn.CrossEntropyLoss()  # The loss function is the cross entropy function
    device = 'cuda' if torch.cuda.is_available() else 'cpu'  # GPU acceleration
    train_loader, class_label = train_data_process()  # Load training set
    test_loader = test_data_process()  # Load test set

    myconvnet = myconvnet.to(device)
    myconvnet, train_process = train_model(myconvnet, train_loader, 0.8, criterion, device, optimizer, num_epochs=25)  # Start training model
    test_model(myconvnet, test_loader, device)  # Evaluation using test sets

In the process of model training, the change curves of loss function and classification accuracy are as follows. It can be seen that the loss function decreases rapidly in the training set, decreases rapidly in the verification set, and then fluctuates and increases slightly. The classification accuracy has been increasing in the training set, and began to fluctuate slightly after gradually increasing in the verification set.

In order to obtain the generalization ability of the calculation model, the test set is given to the trained model for prediction, so as to obtain the prediction accuracy on the test set (as shown in the figure below).

Note: for complex neural networks and large-scale data, using CPU to calculate may not be efficient enough. Therefore, it is necessary to move the model to GPU and use GPU to accelerate calculation.

5. Operation procedure

The following is the content of the main function, configure parameters and return a DenseNet network structure, and then train and test the convolutional neural network.

if __name__ == '__main__':
    num_channels, growth_rate = 64, 32  # num_channels is the current number of channels
    num_convs_in_dense_blocks = [4, 4, 4, 4]
    model = DenseNet(num_channels, growth_rate, num_convs_in_dense_blocks)
    train_model_process(model)

Note: in the previous program, it is configured to use multiple processes to load training set data at the same time. The use of multiple processes must be carried out in the main() function, otherwise an error will be reported during execution.

Posted by OhLordy on Fri, 10 Sep 2021 01:45:07 -0700