Pytoch learning notes (10) -- train and test CNN network

Keywords: Pytorch Deep Learning CNN

This chapter will train a simple CNN network on CIFAR10 dataset:

  • A simple CNN network is trained based on CIFAR-10 data set.
  • Save the trained model and test it.
  • Train with GPU.

CIFAR dataset

CIFAR data sets can be divided into CIFAR10 and CIFAR100. CIFAR-10 includes 10 categories and CIFAR-100 includes 100 categories.


Features: 32x32 color image; 10 categories; 60000 images in total; 50000 training samples + 10000 test samples; 6000 images per category, 10 x 6000 = 60000;

10 categories: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck;

Tips: you don't need to download it manually. You can download it automatically by using the Dataset API in pytorch

Experimental process

Prepare dataset

This step is very convenient in pytorch. Pytorch has prepared common data sets for us. We only need to import them.

The dataset is in the torchvision.dataset package:

import torch
import torchvision
import torchvision.transforms as transforms
from import DataLoader
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import matplotlib.pyplot as plt
import numpy as np

torchvision.dataset.CFAIR10 is a class. You can manipulate a dataset by instantiating an object of this class.
root ---- the path saved after the data set is downloaded
train ---- training or test
Download ---- whether automatic download is required
Transform ---- to transform an image, you generally need to transform the original image with ToTensor(), Normalize()

Then, use the DataLoader class to wrap the dataset for easy reading and use, such as min_batch read, using multithreading.

# --------------------Prepare dataset------------------
# Dataset, DataLoader
transform = transforms.Compose(
     transforms.Normalize((0.5, 0.5, 0.5), std =(0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)

testset = torchvision.datasets.CIFAR10(root='./data',train=False,
                                       transform=transform, download=True)

trainloader = DataLoader(dataset=trainset, batch_size=4, shuffle=True, num_workers=4)
testloader = DataLoader(dataset=testset, batch_size=4, shuffle=True, num_workers=4)
dataiter = iter(trainloader)
images, labels =

# print labels
print(' '.join('%5s' % classes[labels[j]] for j in range(4)))

Define CNN network

For simplicity, the LeNet network is used to change the input channel of the first convolution layer to 3, because CIFAR-10 is a color 3-channel image.

#Define a simple network
# LeNet -5
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=6, kernel_size=5)
        self.pool1 = nn.MaxPool2d(kernel_size=2, stride=2)
        self.conv2 = nn.Conv2d(in_channels=6, out_channels=16, kernel_size=5)
        self.fc1 = nn.Linear(in_features=16 * 5 * 5,out_features=120)
        self.fc2 = nn.Linear(in_features=120, out_features=84)
        self.fc3 = nn.Linear(in_features=84, out_features=10)

    def forward(self, x):
        x = self.pool1(F.relu(self.conv1(x)))
        x = self.pool1(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)              # reshape tensor
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

Set up the optimization and iteration methods of the network and train the network

CNN network training is essentially the problem of minimizing an objective function (loss function). In mathematics, for general convex functions, optimization methods include gradient descent method, Newton method and so on. (in addition, there are heuristic search, such as genetic algorithm). For the training of neural network, the commonly used optimization method is stochastic gradient descent method SGD.

  • Definition of loss function and optimization method
    Cross entropy loss function
    SGD random gradient descent method is used for optimization (driving quantity term)
# Definition of loss function and optimization method
# Cross enterprise loss, SGD with moment
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

  • Iterative optimization and training

Iter ----- an iteration refers to a min_ One forward+backward of batch
Epoch ----- after iterating all training data (once), it is called an epoch

There are 20 epoch s running here.

# Training network
# Iterative epoch
for epoch in range(20):

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # get the input
        inputs, labels = data

        # zeros the paramster gradients
        optimizer.zero_grad()       # 

        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)  # Calculate loss
        loss.backward()     # loss derivation
        optimizer.step()    # Update parameters

        # print statistics
        running_loss += loss.item()  # tensor.item() gets the value of tensor
        if i % 2000 == 1999:
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 2000))  # The average value of loss is output every 2000 iterations
            running_loss = 0.0

print('Finished Training')

Save model

# --------Save model-----------, './model/model_cfair10_2.pth')    # Save the whole model, and the volume is relatively large
#, './model/model_cfair10.pth')

test model

import torch
import torchvision
import torchvision.transforms as transforms
from import DataLoader
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image

CIFAR-10 contains a total of 10 categories:

CFAIR10_names = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'forg', 'horse', 'ship', 'truck']

Loading an image, RBG, must belong to one of the above categories, otherwise it cannot be recognized

# load a image
image ='/xxxx/image/dog.jpg')

Make the same transformation on the image:

transform = transforms.Compose(
    [transforms.Resize((32, 32)),
         mean=(0.5, 0.5, 0.5),
         std=(0.5, 0.5, 0.5)

image_transformed = transform(image)

Points needing attention
The input of CNN network is 4D Tensor (NxCxHxW), and the converted image needs to be transformed into 4D
torsor1.unsqueeze(0) can add a dimension, so the entered tensor is 1x3x32x32

transform = transforms.Compose(
    [transforms.Resize((32, 32)),
         mean=(0.5, 0.5, 0.5),
         std=(0.5, 0.5, 0.5)

image_transformed = transform(image)

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=6, kernel_size=5)
        self.pool1 = nn.MaxPool2d(kernel_size=2, stride=2)
        self.conv2 = nn.Conv2d(in_channels=6, out_channels=16, kernel_size=5)
        self.fc1 = nn.Linear(in_features=16 * 5 * 5,out_features=120)
        self.fc2 = nn.Linear(in_features=120, out_features=84)
        self.fc3 = nn.Linear(in_features=84, out_features=10)

    def forward(self, x):
        x = self.pool1(F.relu(self.conv1(x)))
        x = self.pool1(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)              # reshape tensor
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

net = torch.load('./model/model_cfair10.pth')
# print(net)

image_transformed = image_transformed.unsqueeze(0)
output = net(image_transformed)
predict_value, predict_idx = torch.max(output, 1)  # Find the maximum value of the specified dimension and return the maximum value and index



GPU training and model learning rate adjustment


  • After training 20 epochs with CPU model, the loss decreased to about 0.6. Later, 20 epochs were iterated based on the previous training, and it was found that the loss was between 0.5 and 0.6.
  • Training on the CPU is really slow. It took more than 1h to run 20 epochs (I don't remember the specific time), which is quite long.

Using GPU training model

  • Computer configuration GPU: 1080

First, you need to install the GPU version of pytorch. The specific installation steps are available on the pytorch official website. Training with GPU requires some minor adjustments to the code.

**step1: * * in the code, first use the function in pytorch to determine whether GPU is supported

is_support = torch.cunda.is_available()
if is_support:
  device = torch.device('cuda:0')
 # device = torch.device('cuda:1')
  device = torch.device('cpu')

Step 2: transfer the calculation on CPU to GPU

net = Net()   # GPU mode needs to be added

# Training network
# Iterative epoch
for epoch in range(20):

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # get the input
        inputs, labels = data

        inputs =      #  GPU calculation
        labels =      # GPU calculation

        # zeros the paramster gradients
        optimizer.zero_grad()       # 

        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)  # Calculate loss
        loss.backward()     # loss derivation
        optimizer.step()    # Update parameters

        # print statistics
        running_loss += loss.item()  # tensor.item() gets the value of tensor
        if i % 2000 == 1999:
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 2000))  # The average value of loss is output every 2000 iterations
            running_loss = 0.0

print('Finished Training')

run, you will find that the iteration speed flies, and 20 Epoch iterations can be completed in about 10 minutes, which is very fast.

Learning rate adjustment

  • An important parameter of stochastic gradient descent SGD is learning rate_ rate

The above code uses a fixed learning rate lr=0.001,. At the beginning of the iteration, the learning rate can be larger, so the convergence speed is fast. With the increase of the number of iterations, the learning rate should be reduced to prevent loss oscillation.
For simplicity, I adjust the learning rate to lr=0.0001, and then iterate 20 epochs based on the previous model. Loss was obviously found to be 0.3, 0.2 and 0.1.

Although GPU training was used, lr was reduced to 0.0001 and loss was also reduced (training set loss). In the test, a horse is identified as deer and a bird is identified as cat. Because, to train to a suitable model, other strategies are needed, including the use of other network models.

Evaluate the model performance on the whole test set

  • Calculate Acc
import torch
import torchvision
import torchvision.transforms as transforms
from import DataLoader
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image

CFAIR10_names = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'forg', 'horse', 'ship', 'truck']

# --------------Test data set------------------------------
transform = transforms.Compose(
    [transforms.Resize((32, 32)),
         mean=(0.5, 0.5, 0.5),
         std=(0.5, 0.5, 0.5)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader =, batch_size=4,
                                         shuffle=False, num_workers=4)

# -----------------Nettle model-------------------------------

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=6, kernel_size=5)
        self.pool1 = nn.MaxPool2d(kernel_size=2, stride=2)
        self.conv2 = nn.Conv2d(in_channels=6, out_channels=16, kernel_size=5)
        self.fc1 = nn.Linear(in_features=16 * 5 * 5,out_features=120)
        self.fc2 = nn.Linear(in_features=120, out_features=84)
        self.fc3 = nn.Linear(in_features=84, out_features=10)

    def forward(self, x):
        x = self.pool1(F.relu(self.conv1(x)))
        x = self.pool1(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)              # reshape tensor
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

net = torch.load('./model/model_cfair10_20.pth',map_location='cpu')

# ------------Test on the entire test set-------------------------------------------

correct = 0
total = 0
count = 0
with torch.no_grad():
    for sample_batch in testloader:
        images = sample_batch[0]
        labels = sample_batch[1]
        # forward
        out = net(images)
        _, pred = torch.max(out, 1)
        correct += (pred == labels).sum().item()
        total += labels.size(0)
        print('batch:{}'.format(count + 1))
        count += 1

# Acc
accuracy = float(correct) / total
print('Acc = {:.5f}'.format(accuracy))

