Pytoch learning notes -- transforms

Why transforms?

Generally, the collected image samples are different in size and brightness. In deep learning, we want the sample distribution to be independent and identically distributed, so we need to normalize the samples.
Sometimes only a small amount of sample data can be obtained, and it is not easy to obtain a large number of samples. However, if the number of samples is too small, the accuracy of the training model will be relatively low. In order to solve this problem, it is often necessary to add data argument. The way of data addition is to achieve the purpose through some transformations.

Transformations in pytorch

In pytorch, transforms are located in the torchvision.transforms package, which mainly contains the following transformations:

type	effect
Transforms on PIL Image	Transform PIL.Image image
Transforms on torch.*Tensor	Transform torch.Tensor
Conversion Transforms
Generic Transforms	Some general transformations
Functional Transforms	function

practice

Transforms on PIL Image

type	explain
CenterCrop(size)	Center clipping
FiveCrop(size)	4 corners + center clipping = 5, return multiple images
Grayscale(num_output_channels = 1)	Grayscale
Pad(padding, fill=o,padding_mode='constant)	Add pad to image edge
RandomAffine(degrees,translate,scale,shear,resample,fillcolor)	Random affine transformation
RandomApply(...)	Apply transform to image randomly
RandomCrop(...)	Random position clipping
RandomGrayscale(...)
Resize(size)	Size the image

import numpy as np
from torchvision.transforms import transforms
from PIL import Image

# Prepare the experimental image, a color 32bit image
IMG_PATH = './data/lena_rgb.jpg'
img = Image.open(IMG_PATH)

# -----------------Type conversion---------------------------------------
#transforms1 = transforms.Compose([transforms.ToTensor()])
#img1 = transforms1(img)
#print('img1 = ', img1)

# ---------------Actions on Tensor---------------------------------
#transforms2 = transforms.Compose([transforms.Normalize(mean=(0.5, 0.5, #0.5), std=(0.5, 0.5, 0.5))])
#img2 = transforms2(img1)
#print('img2 = ', img2)

# ---------------Operations on PIL.Image---------------------------------
transforms3 = transforms.Compose([transforms.Resize(256)])
img3 = transforms3(img)
print('img3 = ', img3)
img3.show()

transforms4 = transforms.Compose([transforms.CenterCrop(256)])
img4 = transforms4(img)
print('img4 = ', img4)
img4.show()

transforms5 = transforms.Compose([transforms.RandomCrop(224, padding=0)])
img5 = transforms5(img)
print('img5 = ', img5)
img5.show()

transforms6 = transforms.Compose([transforms.Grayscale(num_output_channels=1)])
img6 = transforms6(img)
print('img6 = ', img6)
img6.show()

transforms7 = transforms.Compose([transforms.ColorJitter()])
img7 = transforms7(img)
img7.show()

Detailed usage of transforms

Transforms is used for graphic transformation. We can also use transforms.Compose to link a series of transforms operations:
Torchvision.transforms.compose ([ts, TS, TS...]) ts is the transforms operation.

For example:

transforms.Compose([
     transforms.CenterCrop(10),
     transforms.ToTensor(), ])

transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

Common transform operations

resize: `transforms.Resize

torchvision.transforms.Resize(size, interpolation=2)

Resizes the input PIL image to the given size.

size (sequence or int)
Required output size. If size is a sequence similar to (h, w), the output size will match this. If size is int, the smaller edge of the image will match this number. That is, if height > width, the image is rescaled to (size * height / width, size)
Interpolation (int, optional) - interpolation required. The default is PIL.Image.BILINEAR

Standardization: transforms.Normalize

torchvision.transforms.Normalize(mean, std)

The tensor image is normalized with mean and standard deviation. Given mean: (M1,..., Mn) and std: (S1,..., Sn) for n channels, this transformation will normalize each channel of the input, torch.*Tensor, i.e. input[channel] = (input[channel] - mean[channel]) / std[channel]

Mean (sequence) - the mean sequence of each channel.
std (sequence) - standard deviation sequence of each channel.

For example:

transform = transforms.Compose(
    						[transforms.ToTensor(),
    					 	transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

Transform the value of the input data from 0-1 to (- 1,1). Specifically, for each channel, Normalize performs the following operations:

image = (image - mean) / std
Where mean and std are specified by (0.5,0.5,0.5) and (0.5,0.5,0.5), respectively. The original 0-1 minimum value 0 becomes (0-0.5) / 0.5 = - 1, while the maximum value 1 becomes (1-0.5) / 0.5 = 1

To Tensor: transforms.ToTensor

torchvision.transforms.ToTensor

Convert PIL Image or ndarray to tensor and normalize to [0-1];
Note: the normalization to [0-1] is directly divided by 255. If your own ndarray data scale changes, you need to modify it yourself.
Jian Shu
Add link description

Posted by The Chancer on Sun, 12 Sep 2021 00:49:40 -0700

Programmer Group