PyTorch series: 2.CIFAR-10 classification

To realize the classification of CIFAR-10, the steps are as follows:

Loading and preprocessing CIFAR-10 datasets using torch vision
Defining network
Define loss function and optimizer
Train network and update network parameters
Test network

1. CIFAR-10 data loading and preprocessing

import torchvision as tv
import torch as t
import torchvision.transforms as transforms
from torchvision.transforms import ToPILImage
show = ToPILImage() #Turn Tensor into Image, aspect visualization
# The first time you run the program torch vision, CIFAR-10 datasets are automatically downloaded.
# If CIFAR-10 has been downloaded, you can specify it through the root parameter

# Define the preprocessing of data. The class Compose is used to manage each transform
transform = transforms.Compose([transforms.ToTensor(), # Turn to Tensor
                                transforms.Normalize((0.5,0.5,0.5),(0.5,0.5,0.5))]) # normalization

# Training set
trainset = tv.datasets.CIFAR10(root='D:\\Workspace\\Python\\CIFAR-10\\', train = True, download=True, transform=transform)
trainloader = t.utils.data.DataLoader(trainset, batch_size=4, shuffle=True,num_workers=2)

#Test set
testset = tv.datasets.CIFAR10(root='D:\\Workspace\\Python\\CIFAR-10\\', train = False, download=True, transform=transform)
testloader = t.utils.data.DataLoader(testset, batch_size=4, shuffle=False,num_workers=2)
classes = ('plane','car','bird','cat','deer','dog','frog','horse','ship','truck')

Files already downloaded and verified
Files already downloaded and verified

1.1 the to tensor class implements the process of Convert a PIL Image or numpy.ndarray to sensor. In PyTorch, the PIL library is often used to read image data, so this method is equivalent to building a bridge between PIL Image and sensor. In addition, it should be emphasized that before data normalization, PIL Image must be converted to tensor, while other resize or cross operations are not required.

transform1 = transforms.Compose([
    transforms.ToTensor(),  # range [0, 255] -> [0.0,1.0]
])
# Converts a PIL Image or numpy.ndarray (H x W x C) in the range
# [0, 255] to a torch.FloatTensor of shape (C x H x W) in the range [0.0, 1.0].

1.2 transforms.Compose normalized to [- 1.0, 1.0]

Change transform1 above to the following:

transform2 = transforms.Compose([  
    transforms.ToTensor(),  
    transforms.Normalize(mean = (0.5, 0.5, 0.5), std = (0.5, 0.5, 0.5))  
    ])

transforms.Normalize use the following formula to normalize:

Channel = (channel mean) / std (because transforms.ToTensor() has processed data into [0,1], then (x-0.5)/0.5 is [- 1.0, 1.0])

In this way, each value in our data becomes a number of [- 1,1].

1.3 torch.utils.data.DataLoader()

If there are special requirements for data input, you can set parameters:

Dataset, this is the output of PyTorch's existing data reading interface (such as torch vision. Datasets. Imagefolder) or custom data interface. The output is either the object of torch.utils.data.Dataset class, or the object of self defined class inherited from torch.utils.data.Dataset class.
batch_size, the number of images loaded in each batch. Default =1
shuffle，set to True to have the data reshuffled at every epoch(default:False).
Collate FN is a package used to process input dataset s in different situations. Generally, it can be used by default, unless your customized data reading and output are very rare.
From the comments, it can be seen that the batch ABCD sampler is mutually exclusive with the batch ABCD size, shuffle and other parameters, and the default is generally used.
sampler, defines the strategy to draw samples from the dataset. It is mutually exclusive with shuffle, which is usually the default.
From the comments, it can be seen that this parameter must be greater than or equal to 0. If 0, it means that the data is imported in the main process. If other numbers are greater than 0, it means that the data is imported through multiple processes, which can speed up the data import.
Pin'u memory (bool, optional): if true, the data loader will copy tensors into CUDA pinned memory before returning them.
T imeout is used to set the time-out time for data reading, but an error will be reported if the data has not been read before this time.

1.4 Dataset object is a Dataset, which can be accessed by subscript and return data with the shape of (data,label).

(data, label) = trainset[100]
print(classes[label])
#(data+1)/2 is to restore the normalized data. The output picture of the program is as shown in the figure
show((data+1)/2).resize((100,100))

Out: ship

1.5 Dataloder is an iterative object, which splices every data sample returned by dataset into a batch, and provides multithreading to accelerate optimization and data scrambling. After the program traverses all data of dataset, it also iterates over Dataloader.

dataiter = iter(trainloader)
images,labels = dataiter.next() #Return 4 pictures and labels, as shown below
print(''.join('%11s'%classes[labels[j]] for j in range(4)))
show(tv.utils.make_grid((images+1)/2)).resize((400,100))

Out:   horse      plane       frog       deer

2. Define network

import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
    def __init__(self):
        #nn.Module subclass must execute the constructor of the parent class in the constructor
        #The following formula is equivalent to nn.module
        super(Net,self).__init__()
        #Convolution layer '3' indicates that the input picture is 3 channels, '6' indicates the number of output channels, '5' indicates that the convolution kernel is 5 * 5
        self.conv1 = nn.Conv2d(3,6,5)
        #Convolution layer
        self.conv2 = nn.Conv2d(6,16,5)
        #Affine layer / full connection layer, y=Wx+b
        self.fc1 = nn.Linear(16*5*5,120)
        self.fc2 = nn.Linear(120,84)
        self.fc3 = nn.Linear(84,10)
        
    def forward(self,x):
        #Convolution - > activation - > pooling
        x = F.max_pool2d(F.relu(self.conv1(x)),(2,2))
        x = F.max_pool2d(F.relu(self.conv2(x)),2)
        # reshape, '-1' means adaptive
        x = x.view(x.size()[0], -1)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x
net = Net()
print(net)
# Out: Net(
#         (conv1): Conv2d(3, 6, kernel_size=(5, 5), stride=(1, 1))
#         (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
#         (fc1): Linear(in_features=400, out_features=120, bias=True)
#         (fc2): Linear(in_features=120, out_features=84, bias=True)
#         (fc3): Linear(in_features=84, out_features=10, bias=True)
#         )

3. Define loss function and optimizer (loss and optimizer)

from torch import optim
criterion = nn.CrossEntropyLoss() #Cross entropy loss function
optimizer = optim.SGD(net.parameters(),lr=0.001, momentum = 0.9)

4. Training network

The training process of all networks is similar, and the following process is continuously implemented.

Input data
Forward propagation + backward propagation
Update parameters

for epoch in range(2):  # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # get the inputs
        inputs, labels = data

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        # Update parameters
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 2000 == 1999:    # print every 2000 mini-batches
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 2000))
            running_loss = 0.0

print('Finished Training')

Only two epochs are trained here (one epoch is called after traversing the dataset). Let's see if the Internet works. Input the test image into the network, calculate its label, and then compare it with the actual label.

dataiter = iter(testloader)
images,labels = dataiter.next() #A batch returns 4 pictures

# print images
print('GroundTruth: ', ' '.join('%08s' % classes[labels[j]] for j in range(4)))
show(tv.utils.make_grid(images/2-0.5)).resize((400,100))

Out:         GroundTruth:       cat     ship     ship    plane

Then calculate the label of network prediction:

#Calculate the classification of pictures in each category
outputs = net(images)
#The one with the highest score
_, predicted = t.max(outputs.data, 1)

print('Predicted: ', ' '.join('%5s' % classes[predicted[j]]  for j in range(4)))

Out: Predicted: dog ship car plane

Let's take a look at the effect on the whole test set.

correct = 0 #Predict the correct number of pictures
total = 0 #Total picture number
with t.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predicted = t.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print('Accuracy of the network on the 10000 test images: %d %%' % ( 100 * correct / total))

Out: Accuracy of the network on the 10000 test images: 61 %

5. Training on GPU

if t.cuda.is_available():
    net.cuda()
    images = images.cuda()
    labels = labels.cuda()
    output = net(images)
    loss = criterion(output, labels)

Sophia$

Published 5 original articles, won praise 0, visited 225

Private letter follow

Posted by Ivan_ on Sun, 19 Jan 2020 01:47:12 -0800

Programmer Group