Image augmentation
As mentioned in Section 5.6 (Deep Convolution Neural Network), large-scale datasets are the prerequisite for successful application of deep neural networks.Image augmentation technology enlarges the size of training dataset by making a series of random changes to the training image to produce similar but different training samples.Another explanation for image augmentation is that randomly changing training samples can reduce the model's dependence on certain attributes and thus improve its generalization ability.For example, we can crop images in different ways to make objects of interest appear in different locations, thereby reducing the dependence of the model on the location of objects.We can also adjust the brightness, color and other factors to reduce the sensitivity of the model to color.It can be said that the image enhancement technology contributed to AlexNet's success that year.In this section we will discuss the technology that is widely used in computer vision.
First, import the package or module required for the experiment.
import os os.listdir("/home/kesci/input/img2083/")
['img']
%matplotlib inline import os import time import torch from torch import nn, optim from torch.utils.data import Dataset, DataLoader import torchvision import sys from PIL import Image sys.path.append("/home/kesci/input/") #Set the GPU device currently in use as device 0 only os.environ["CUDA_VISIBLE_DEVICES"] = "0" import d2lzh1981 as d2l # Define device, use GPU or not, automatically select based on computer configuration device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') print(torch.__version__) print(device)
1.3.0 cpu
9.1.1 Common Image Augmentation Methods
Let's read an image of 400 x 500400times 500400 x 500 (400 pixels in height and 500 pixels in width) as an example of the experiment.
d2l.set_figsize() img = Image.open('/home/kesci/input/img2083/img/cat1.jpg') d2l.plt.imshow(img)
<matplotlib.image.AxesImage at 0x7f8dae7aa198>
The drawing function show_images is defined below.
# This function has been saved in the d2lzh_pytorch package for future use def show_images(imgs, num_rows, num_cols, scale=2): figsize = (num_cols * scale, num_rows * scale) _, axes = d2l.plt.subplots(num_rows, num_cols, figsize=figsize) for i in range(num_rows): for j in range(num_cols): axes[i][j].imshow(imgs[i * num_cols + j]) axes[i][j].axes.get_xaxis().set_visible(False) axes[i][j].axes.get_yaxis().set_visible(False) return axes
Most image enhancement methods have some randomness.To facilitate the observation of the effect of image enlargement, we next define an auxiliary function apply.This function runs the aug image enhancement method multiple times for the input image img and shows all the results.
def apply(img, aug, num_rows=2, num_cols=4, scale=1.5): Y = [aug(img) for _ in range(num_rows * num_cols)] show_images(Y, num_rows, num_cols, scale)
9.1.1.1 Flipping and Clipping
Flipping an image left or right usually does not change the type of object.It is the earliest and most widely used image enhancement method.Next, we create RandomHorizontalFlip instances through the torchvision.transforms module to achieve half-probability image level flips (left and right).
apply(img, torchvision.transforms.RandomHorizontalFlip())
Flipping up and down is not as common as flipping left and right.However, at least for the sample image, flipping up and down will not cause recognition obstacles.Next, we create a RandomVerticalFlip instance to flip images vertically (up and down) with a half-probability.
apply(img, torchvision.transforms.RandomVerticalFlip())
In the sample image we are using, the cat is in the middle of the image, but in general this may not be the case.In Section 5.4 (Pooling Layer), we explained that pooling can reduce the sensitivity of the convolution layer to the target location.In addition, we can also make objects appear at different locations in the image at different scales by cropping the image randomly, which can also reduce the sensitivity of the model to the target location.
In the code below, we randomly clip out an area of 10%100%10%\sim 100\%10%100% of the original area each time, and the ratio of width to height of the area is randomly taken from 0.5_20.5sim 20.5_2, then scale the area to 200 pixels in width and height, respectively.Without special explanation, the random number between a a A and B B B in this section refers to the continuous value obtained from random uniform sampling in the interval [a,b][a,b][a,b].
shape_aug = torchvision.transforms.RandomResizedCrop(200, scale=(0.1, 1), ratio=(0.5, 2)) apply(img, shape_aug)
9.1.1.2 Change color
Another type of augmentation is to change colors.We can change the color of an image in four ways: brightness, contrast, saturation, and hue.In the following example, we randomly change the brightness of the image to 50%50%(1_0.51-0.51_0.5)150%sim 150\%150%(1+0.51+0.51+0.5) of the original brightness.
apply(img, torchvision.transforms.ColorJitter(brightness=0.5, contrast=0, saturation=0, hue=0))
We can also randomly change the hue of the image.
apply(img, torchvision.transforms.ColorJitter(brightness=0, contrast=0, saturation=0, hue=0.5))
Similarly, we can randomly change the contrast of the image.
apply(img, torchvision.transforms.ColorJitter(brightness=0, contrast=0.5, saturation=0, hue=0))
We can also set how to randomly change the brightness, contrast, saturation, and hue of the image at the same time.
color_aug = torchvision.transforms.ColorJitter(brightness=0.5, contrast=0.5, saturation=0.5, hue=0.5) apply(img, color_aug)
9.1.1.3 Overlay Multiple Image Augmentation Methods
In practical applications, we will overlay several image augmentation methods.We can use Compose instances to overlay the image augmentation methods defined above and apply them to each image.
augs = torchvision.transforms.Compose([ torchvision.transforms.RandomHorizontalFlip(), color_aug, shape_aug]) apply(img, augs)
9.1.2 Use image augmentation training model
Let's take a look at an example of using image augmentation in real training.Here we use the CIFAR-10 dataset instead of the Fashion-MNIST dataset we've been using before.This is because the position and size of objects in the Fashion-MNIST dataset have been normalized, while the color and size of objects in the CIFAR-10 dataset are more significant.The first 32 training images from the CIFAR-10 dataset are shown below.
CIFAR_ROOT_PATH = '/home/kesci/input/cifar102021' all_imges = torchvision.datasets.CIFAR10(train=True, root=CIFAR_ROOT_PATH, download = True) # Every element of all_imges is (image, label) show_images([all_imges[i][0] for i in range(32)], 4, 8, scale=0.8);
Files already downloaded and verified
In order to get a definite result in the prediction, we usually only apply image augmentation to the training samples, instead of using image augmentation with random operations in the prediction.Here we only use the simplest random left-right flip.In addition, we use ToTensor to convert small batches of images into the format PyTorch requires, i.e., shapes (batch size, number of channels, height, width), domains between 0 and 1, and types of 32-bit floating point numbers.
flip_aug = torchvision.transforms.Compose([ torchvision.transforms.RandomHorizontalFlip(), torchvision.transforms.ToTensor()]) no_aug = torchvision.transforms.Compose([ torchvision.transforms.ToTensor()])
Next, we define an auxiliary function to easily read images and apply image augmentation.For a detailed description of DataLoader, refer to the earlier 3.5-section Image Classification Dataset (Fashion-MNIST).
num_workers = 0 if sys.platform.startswith('win32') else 4 def load_cifar10(is_train, augs, batch_size, root=CIFAR_ROOT_PATH): dataset = torchvision.datasets.CIFAR10(root=root, train=is_train, transform=augs, download=False) return DataLoader(dataset, batch_size=batch_size, shuffle=is_train, num_workers=num_workers)
9.1.2.1 Use image augmentation training model
We trained the ResNet-18 model described in section 5.11 (Residual Network) on the CIFAR-10 dataset.
First, we define the train function to use GPU training and evaluate the model.
# This function has been saved in the d2lzh_pytorch package for future use def train(train_iter, test_iter, net, loss, optimizer, device, num_epochs): net = net.to(device) print("training on ", device) batch_count = 0 for epoch in range(num_epochs): train_l_sum, train_acc_sum, n, start = 0.0, 0.0, 0, time.time() for X, y in train_iter: X = X.to(device) y = y.to(device) y_hat = net(X) l = loss(y_hat, y) optimizer.zero_grad() l.backward() optimizer.step() train_l_sum += l.cpu().item() train_acc_sum += (y_hat.argmax(dim=1) == y).sum().cpu().item() n += y.shape[0] batch_count += 1 test_acc = d2l.evaluate_accuracy(test_iter, net) print('epoch %d, loss %.4f, train acc %.3f, test acc %.3f, time %.1f sec' % (epoch + 1, train_l_sum / batch_count, train_acc_sum / n, test_acc, time.time() - start))
You can then define the train_with_data_aug function to use image augmentation to train the model.This function uses the Adam algorithm as an optimization algorithm for training, then applies image augmentation to the training dataset, and finally calls the train function just defined to train and evaluate the model.
%% Below, type any markdown to display in the Graffiti tip.
%% Then run this cell to save it.
train_iter = load_cifar10(True, train_augs, batch_size) test_iter = load_cifar10(False, test_augs, batch_size)
def train_with_data_aug(train_augs, test_augs, lr=0.001): batch_size, net = 256, d2l.resnet18(10) optimizer = torch.optim.Adam(net.parameters(), lr=lr) loss = torch.nn.CrossEntropyLoss() train_iter = load_cifar10(True, train_augs, batch_size) test_iter = load_cifar10(False, test_augs, batch_size) train(train_iter, test_iter, net, loss, optimizer, device, num_epochs=10)
The following uses random left-right flipping image augmentation to train the model.
train_with_data_aug(flip_aug, no_aug)
training on cpu epoch 1, loss 1.3790, train acc 0.504, test acc 0.554, time 195.8 sec epoch 2, loss 0.4992, train acc 0.646, test acc 0.592, time 192.5 sec epoch 3, loss 0.2821, train acc 0.702, test acc 0.657, time 193.7 sec epoch 4, loss 0.1859, train acc 0.739, test acc 0.693, time 195.4 sec epoch 5, loss 0.1349, train acc 0.766, test acc 0.688, time 192.6 sec epoch 6, loss 0.1022, train acc 0.786, test acc 0.701, time 200.2 sec epoch 7, loss 0.0797, train acc 0.806, test acc 0.720, time 191.8 sec epoch 8, loss 0.0633, train acc 0.825, test acc 0.695, time 198.6 sec epoch 9, loss 0.0524, train acc 0.836, test acc 0.693, time 192.1 sec epoch 10, loss 0.0437, train acc 0.850, test acc 0.769, time 196.3 sec