Why transforms?
- Generally, the collected image samples are different in size and brightness. In deep learning, we want the sample distribution to be independent and identically distributed, so we need to normalize the samples.
- Sometimes only a small amount of sample data can be obtained, and it is not easy to obtain a large number of samples. However, if the number of samples is too small, the accuracy of the training model will be relatively low. In order to solve this problem, it is often necessary to add data argument. The way of data addition is to achieve the purpose through some transformations.
Transformations in pytorch
In pytorch, transforms are located in the torchvision.transforms package, which mainly contains the following transformations:
type | effect |
---|---|
Transforms on PIL Image | Transform PIL.Image image |
Transforms on torch.*Tensor | Transform torch.Tensor |
Conversion Transforms | |
Generic Transforms | Some general transformations |
Functional Transforms | function |
practice
Transforms on PIL Image
type | explain |
---|---|
CenterCrop(size) | Center clipping |
FiveCrop(size) | 4 corners + center clipping = 5, return multiple images |
Grayscale(num_output_channels = 1) | Grayscale |
Pad(padding, fill=o,padding_mode='constant) | Add pad to image edge |
RandomAffine(degrees,translate,scale,shear,resample,fillcolor) | Random affine transformation |
RandomApply(...) | Apply transform to image randomly |
RandomCrop(...) | Random position clipping |
RandomGrayscale(...) | |
Resize(size) | Size the image |
import numpy as np from torchvision.transforms import transforms from PIL import Image # Prepare the experimental image, a color 32bit image IMG_PATH = './data/lena_rgb.jpg' img = Image.open(IMG_PATH) # -----------------Type conversion--------------------------------------- #transforms1 = transforms.Compose([transforms.ToTensor()]) #img1 = transforms1(img) #print('img1 = ', img1) # ---------------Actions on Tensor--------------------------------- #transforms2 = transforms.Compose([transforms.Normalize(mean=(0.5, 0.5, #0.5), std=(0.5, 0.5, 0.5))]) #img2 = transforms2(img1) #print('img2 = ', img2) # ---------------Operations on PIL.Image--------------------------------- transforms3 = transforms.Compose([transforms.Resize(256)]) img3 = transforms3(img) print('img3 = ', img3) img3.show() transforms4 = transforms.Compose([transforms.CenterCrop(256)]) img4 = transforms4(img) print('img4 = ', img4) img4.show() transforms5 = transforms.Compose([transforms.RandomCrop(224, padding=0)]) img5 = transforms5(img) print('img5 = ', img5) img5.show() transforms6 = transforms.Compose([transforms.Grayscale(num_output_channels=1)]) img6 = transforms6(img) print('img6 = ', img6) img6.show() transforms7 = transforms.Compose([transforms.ColorJitter()]) img7 = transforms7(img) img7.show()
Detailed usage of transforms
Transforms is used for graphic transformation. We can also use transforms.Compose to link a series of transforms operations:
Torchvision.transforms.compose ([ts, TS, TS...]) ts is the transforms operation.
For example:
transforms.Compose([ transforms.CenterCrop(10), transforms.ToTensor(), ])
transform = transforms.Compose( [transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
Common transform operations
- resize: `transforms.Resize
torchvision.transforms.Resize(size, interpolation=2)
Resizes the input PIL image to the given size.
- size (sequence or int)
- Required output size. If size is a sequence similar to (h, w), the output size will match this. If size is int, the smaller edge of the image will match this number. That is, if height > width, the image is rescaled to (size * height / width, size)
- Interpolation (int, optional) - interpolation required. The default is PIL.Image.BILINEAR
- Standardization: transforms.Normalize
torchvision.transforms.Normalize(mean, std)
The tensor image is normalized with mean and standard deviation. Given mean: (M1,..., Mn) and std: (S1,..., Sn) for n channels, this transformation will normalize each channel of the input, torch.*Tensor, i.e. input[channel] = (input[channel] - mean[channel]) / std[channel]
- Mean (sequence) - the mean sequence of each channel.
- std (sequence) - standard deviation sequence of each channel.
For example:
transform = transforms.Compose( [transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
Transform the value of the input data from 0-1 to (- 1,1). Specifically, for each channel, Normalize performs the following operations:
image = (image - mean) / std
Where mean and std are specified by (0.5,0.5,0.5) and (0.5,0.5,0.5), respectively. The original 0-1 minimum value 0 becomes (0-0.5) / 0.5 = - 1, while the maximum value 1 becomes (1-0.5) / 0.5 = 1
- To Tensor: transforms.ToTensor
torchvision.transforms.ToTensor
Convert PIL Image or ndarray to tensor and normalize to [0-1];
Note: the normalization to [0-1] is directly divided by 255. If your own ndarray data scale changes, you need to modify it yourself.
Jian Shu
Add link description