nn.Conv2d -- Interpretation of two-dimensional convolution operation

nn.Conv2d -- two dimensional convolution operation

torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros', device=None, dtype=None)

Function: 2D convolution operation is applied to the input signal composed of multiple input planes, which is commonly used in image processing.

Input:

in_channels: enter the number of channels in the image
out_channels: the number of channels generated by convolution operation
kernel_size: convolution kernel size, integer or tuple type
Stripe: the stride of convolution operation, integer or tuple type. The default is 1
Padding: padding size at the boundary, integer or tuple type, 0 by default
padding_mode: filling method, zeros, reflect, replicate, circular. Zeros is the default
zeros: zero filling, all filling 0 at the tensor boundary
reflect: Mirror filling, which takes the edge of the matrix as the axis of symmetry and fills the symmetrical elements in the opposite direction to the outermost.
replicate: copy filling, filling the tensor with the copy value of the input boundary
circular: circular filling, repeating elements on the other side of the matrix boundary
See code cases for specific differences
Division: the distance between control points. The default is 1. If it is greater than 1, the operation is also called expansion convolution operation.
A graph to understand the extended convolution operation

Image source: https://github.com/vdumoulin/conv_arithmetic

groups: controls the connection between input and output. The default is 1.
- When groups=1, all inputs are convoluted into outputs
- When groups=2, this operation is equivalent to first halving the input channel, respectively through the same conv operation (so the convolution parameters will be halved), generating the corresponding output, and then connecting the two outputs. Groups > 2 is similar, and the maximum number of input channels cannot be exceeded.
- groups must be divisible by in_channels and out_channels
bias: whether there is an offset item. The default is True, that is, there is an offset item by default.
The array data type entered must be TensorFloat32

be careful:

in_channels, out_channels and kernel_size is a parameter that must be specified. Other parameters have default values and can not be specified.
kernel_size, stripe, padding, and division can be specified as either integer or tuple types.
- If specified as an integer, the height and width dimensions use the same value (square)
- If it is specified as a tuple type, the first value in the tuple is used for height and the second dimension is used for width (rectangle)

Supplement:

Calculation formula of height and width of output image (the formula is transferred from the official document):

KAtex parse error: expected 'EOF', got '&' at position 8: assumption: \ \& ̲ Input: (n, C #u{i

H o u t = H i n + 2 × p a d d i n g [ 0 ] − d i l a t i o n [ 0 ] × ( k e r n e l _ s i z e [ 0 ] − 1 ) − 1 s t r i d e [ 0 ] + 1 W o u t = W i n + 2 × p a d d i n g [ 1 ] − d i l a t i o n [ 1 ] × ( k e r n e l _ s i z e [ 1 ] − 1 ) − 1 s t r i d e [ 1 ] + 1 H_{out}=\frac{H_{in}+2\times padding[0]-dilation[0]\times(kernel\_size[0]-1)-1}{stride[0]}+1\\ W_{out}=\frac{W_{in}+2\times padding[1]-dilation[1]\times(kernel\_size[1]-1)-1}{stride[1]}+1\\ Hout=stride[0]Hin+2×padding[0]−dilation[0]×(kernel_size[0]−1)−1+1Wout=stride[1]Win+2×padding[1]−dilation[1]×(kernel_size[1]−1)−1+1

The weight of the convolution layer can be extracted by the method Conv2d.weight, and the output weight array size is:
( o u t _ c h a n n e l s , i n _ c h a n n e l s g r o u p s , k e r n e l _ s i z e [ 0 ] , k e r n e l _ s i z e [ 0 ] ) (out\_channels,\frac{in\_channels}{groups},kernel\_size[0],kernel\_size[0]) (out_channels,groupsin_channels,kernel_size[0],kernel_size[0])

And the initialization weight division obeys uniform distribution:
u ( − k , k ) , his in k = g r o u p s C i n × ∏ i = 0 1 k e r n e l _ s i z e [ i ] u(-\sqrt{k},\sqrt{k}), where k = \ frac {groups} {C {in} \ times \ prod \ limits {I = 0} ^ 1 kernel \ _size [i]} u(−k ,k ), where k=Cin × i=0∏1kernel_size[i]groups

The offset parameters of the convolution layer can be extracted by the method Conv2d.bias (assuming bias=True). The output array size is the same as the out_channels size, and the initialization part is the same as the weight part.
Convolution layer parameters can also be obtained through the. parameters() method

Code case

General usage

import torch.nn as nn
import torch
img=torch.arange(49,dtype=torch.float32).view(1,1,7,7)
conv=nn.Conv2d(in_channels=1,out_channels=1,kernel_size=3)
img_2=conv(img)
print(img)
print(img_2)

output

# Before convolution
tensor([[[[ 0.,  1.,  2.,  3.,  4.,  5.,  6.],
          [ 7.,  8.,  9., 10., 11., 12., 13.],
          [14., 15., 16., 17., 18., 19., 20.],
          [21., 22., 23., 24., 25., 26., 27.],
          [28., 29., 30., 31., 32., 33., 34.],
          [35., 36., 37., 38., 39., 40., 41.],
          [42., 43., 44., 45., 46., 47., 48.]]]])
# After convolution
tensor([[[[4.7303, 4.8851, 5.0398, 5.1945, 5.3492],
          [5.8134, 5.9681, 6.1228, 6.2775, 6.4323],
          [6.8964, 7.0512, 7.2059, 7.3606, 7.5153],
          [7.9795, 8.1342, 8.2889, 8.4436, 8.5984],
          [9.0625, 9.2172, 9.3720, 9.5267, 9.6814]]]],
       grad_fn=<ThnnConv2DBackward>)

Changes in image size

import torch.nn as nn
import torch
img=torch.arange(4*64*28*28,dtype=torch.float32).view(4,64,28,28)
conv=nn.Conv2d(in_channels=64,out_channels=128,kernel_size=3,padding=1)
img_2=conv(img)
print(img.shape)
print(img_2.shape)

output

# Before convolution
torch.Size([4, 64, 28, 28])
# After convolution
torch.Size([4, 128, 28, 28])

The number of channels becomes twice that of the original one, and the size remains unchanged after convolution because one grid is filled

Parameters of output convolution operation

Convolution layer value

import torch.nn as nn
import torch
conv=nn.Conv2d(in_channels=1,out_channels=1,kernel_size=3)
print(conv.weight)
print(conv.bias)
print(type(conv.weight))
# Call parameters with the. parameters() method
for i in conv.parameters():
    print(i)
print(type(i))

output

# Convolution layer weight parameter
Parameter containing:
tensor([[[[-0.1891, -0.2296,  0.0362],
          [-0.1552, -0.0747,  0.2922],
          [-0.1434,  0.0802, -0.0778]]]], requires_grad=True)
# Convolution layer bias parameters
Parameter containing:
tensor([0.1998], requires_grad=True)
# The types of parameters are Parameter class
<class 'torch.nn.parameter.Parameter'>
# Here are the parameters called through the. parameters() method, which is the same as the previous method result
Parameter containing:
tensor([[[[-0.1891, -0.2296,  0.0362],
          [-0.1552, -0.0747,  0.2922],
          [-0.1434,  0.0802, -0.0778]]]], requires_grad=True)
Parameter containing:
tensor([0.1998], requires_grad=True)
# The data type returned is the same
<class 'torch.nn.parameter.Parameter'>

Convolution layer parameter size

import torch.nn as nn
import torch
conv=nn.Conv2d(in_channels=64,out_channels=128,kernel_size=[5,3],padding=2)
print(conv.weight.shape)
print(conv.bias.shape)

output

# Weight parameters, from the first dimension to the fourth dimension, represent:
# Number of output channels, number of input channels, height of convolution kernel, width of convolution kernel
torch.Size([128, 64, 5, 3])
# The offset term parameter has the same size as the number of input channels
torch.Size([128])

fill style

In order to eliminate the influence of convolution operation on the original graph, we first set the convolution kernel size to 1, and the parameter to 1, without setting the offset term. In order to highlight the expanded effect, we adjust padding to 3.

Initialization process

import torch.nn as nn
import torch
conv_1=nn.Conv2d(in_channels=1,out_channels=1,kernel_size=1,bias=False,padding=3,mode='zeros')
conv_1.weight=nn.parameter(torch.ones((1,1,1,1)))
conv_2=nn.Conv2d(in_channels=1,out_channels=1,kernel_size=1,bias=False,padding=3,mode='reflect')
conv_2.weight=nn.parameter(torch.ones((1,1,1,1)))
conv_3=nn.Conv2d(in_channels=1,out_channels=1,kernel_size=1,bias=False,padding=3,mode='replicate')
conv_3.weight=nn.parameter(torch.ones((1,1,1,1)))
conv_4=nn.Conv2d(in_channels=1,out_channels=1,kernel_size=1,bias=False,padding=3,mode='circular')
conv_4.weight=nn.parameter(torch.ones((1,1,1,1)))
img=torch.arange(25,dtype=torch.float32).reshape(1,1,5,5)

Zero fill

img_1=conv_1(img)
print(img)
print(img_1)

output

Mirror fill

img_2=conv_2(img)
print(img)
print(img_2)

output

Copy fill

img_3=conv_3(img)
print(img)
print(img_3)

output

Cyclic filling

img_4=conv_4(img)
print(img)
print(img_4)

output

Official documents

nn.Conv2d(): https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html?highlight=conv2d#torch.nn.Conv2d

Point a favor and support it

Posted by dazraf on Sun, 19 Sep 2021 17:59:02 -0700

Programmer Group

nn.Conv2d -- Interpretation of two-dimensional convolution operation

nn.Conv2d -- two dimensional convolution operation

Code case

General usage

Parameters of output convolution operation

fill style

Zero fill

Mirror fill

Copy fill

Cyclic filling

Official documents

Hot Keywords