Pytoch learning notes -- transforms

Keywords: Python Pytorch Deep Learning


Why transforms?

  1. Generally, the collected image samples are different in size and brightness. In deep learning, we want the sample distribution to be independent and identically distributed, so we need to normalize the samples.
  2. Sometimes only a small amount of sample data can be obtained, and it is not easy to obtain a large number of samples. However, if the number of samples is too small, the accuracy of the training model will be relatively low. In order to solve this problem, it is often necessary to add data argument. The way of data addition is to achieve the purpose through some transformations.

Transformations in pytorch

In pytorch, transforms are located in the torchvision.transforms package, which mainly contains the following transformations:

Transforms on PIL ImageTransform PIL.Image image
Transforms on torch.*TensorTransform torch.Tensor
Conversion Transforms
Generic TransformsSome general transformations
Functional Transformsfunction


Transforms on PIL Image

CenterCrop(size)Center clipping
FiveCrop(size)4 corners + center clipping = 5, return multiple images
Grayscale(num_output_channels = 1)Grayscale
Pad(padding, fill=o,padding_mode='constant)Add pad to image edge
RandomAffine(degrees,translate,scale,shear,resample,fillcolor)Random affine transformation
RandomApply(...)Apply transform to image randomly
RandomCrop(...)Random position clipping
Resize(size)Size the image
import numpy as np
from torchvision.transforms import transforms
from PIL import Image

# Prepare the experimental image, a color 32bit image
IMG_PATH = './data/lena_rgb.jpg'
img =

# -----------------Type conversion---------------------------------------
#transforms1 = transforms.Compose([transforms.ToTensor()])
#img1 = transforms1(img)
#print('img1 = ', img1)

# ---------------Actions on Tensor---------------------------------
#transforms2 = transforms.Compose([transforms.Normalize(mean=(0.5, 0.5, #0.5), std=(0.5, 0.5, 0.5))])
#img2 = transforms2(img1)
#print('img2 = ', img2)

# ---------------Operations on PIL.Image---------------------------------
transforms3 = transforms.Compose([transforms.Resize(256)])
img3 = transforms3(img)
print('img3 = ', img3)

transforms4 = transforms.Compose([transforms.CenterCrop(256)])
img4 = transforms4(img)
print('img4 = ', img4)

transforms5 = transforms.Compose([transforms.RandomCrop(224, padding=0)])
img5 = transforms5(img)
print('img5 = ', img5)

transforms6 = transforms.Compose([transforms.Grayscale(num_output_channels=1)])
img6 = transforms6(img)
print('img6 = ', img6)

transforms7 = transforms.Compose([transforms.ColorJitter()])
img7 = transforms7(img)

Detailed usage of transforms

Transforms is used for graphic transformation. We can also use transforms.Compose to link a series of transforms operations:
Torchvision.transforms.compose ([ts, TS, TS...]) ts is the transforms operation.

For example:

     transforms.ToTensor(), ])
transform = transforms.Compose(
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

Common transform operations

  1. resize: `transforms.Resize
torchvision.transforms.Resize(size, interpolation=2)

Resizes the input PIL image to the given size.

  • size (sequence or int)
  • Required output size. If size is a sequence similar to (h, w), the output size will match this. If size is int, the smaller edge of the image will match this number. That is, if height > width, the image is rescaled to (size * height / width, size)
  • Interpolation (int, optional) - interpolation required. The default is PIL.Image.BILINEAR
  1. Standardization: transforms.Normalize
torchvision.transforms.Normalize(mean, std)

The tensor image is normalized with mean and standard deviation. Given mean: (M1,..., Mn) and std: (S1,..., Sn) for n channels, this transformation will normalize each channel of the input, torch.*Tensor, i.e. input[channel] = (input[channel] - mean[channel]) / std[channel]

  • Mean (sequence) - the mean sequence of each channel.
  • std (sequence) - standard deviation sequence of each channel.

For example:

transform = transforms.Compose(
    					 	transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

Transform the value of the input data from 0-1 to (- 1,1). Specifically, for each channel, Normalize performs the following operations:

image = (image - mean) / std
Where mean and std are specified by (0.5,0.5,0.5) and (0.5,0.5,0.5), respectively. The original 0-1 minimum value 0 becomes (0-0.5) / 0.5 = - 1, while the maximum value 1 becomes (1-0.5) / 0.5 = 1

  1. To Tensor: transforms.ToTensor

Convert PIL Image or ndarray to tensor and normalize to [0-1];
Note: the normalization to [0-1] is directly divided by 255. If your own ndarray data scale changes, you need to modify it yourself.
Jian Shu
Add link description

Posted by The Chancer on Sun, 12 Sep 2021 00:49:40 -0700