Parametric interpretation of convolution function conv2d of TensorFlow

Links to the original text: https://blog.csdn.net/qq_34782535/article/details/87906835

I. Principles of Convolution Operation

Convolution operation

Although convolution is named after convolution operation, we usually use more intuitive cross-correlation operation in convolution layer. In the two-dimensional convolution layer, a two-dimensional input array and a two-dimensional kernel array output a two-dimensional array by cross-correlation operation. We use a concrete example to explain the meaning of two-dimensional cross-correlation operation. As shown in Figure 5.1, the input is a two-dimensional array with both height and width of 3. We record the shape of the array as 3 * 3 or (3, 3). The height and width of the core array are 2, respectively. This array is also called convolution core or filter in convolution calculation. The shape of the convolution window (also called convolution window) depends on the height and width of the convolution core, i.e. 2 x 2. The shaded part in Fig. 5.1 is the first output element and the input and core array elements used for its calculation: 0*0+1*1+3*2+4*3=19.

Filling and stride

In the example in the previous section, we use a convolution core with a height and width of 3 and a height and width of 2 to get an output with a height and width of 2. Generally speaking, assuming the input shape is nh * nw and the convolution core window shape is kh * kw, the output shape will be

(nh−kh+1)×(nw−kw+1).

So the output shape of the convolution layer is determined by the input shape and the convolution core window shape. In this section, we will introduce two superparameters of convolution layer, i. e. filling and step size. They can change the output shape for the input and convolution cores of a given shape.

Generally speaking, if the ph rows are filled on both sides of the high side and the pw columns are filled on both sides of the wide side, the output shape will be

(nh−kh+ph+1)×(nw−kw+pw+1),

That is to say, the output height and width will increase ph and pw, respectively.

In many cases, we set ph=kh_1 and pw=kw_1 to make the input and output have the same height and width. This makes it easy to infer the output shape of each layer when constructing the network. Assuming that KH is an odd number here, we will fill the ph/2 lines on both sides of the high. If KH is even, one possibility is to fill pH / 2 lines on the top side of the input and pH / 2 lines on the bottom side. Filling on both sides of the width is the same.

Convolutional neural networks often use odd-numbered convolution kernels, such as 1, 3, 5 and 7, so the number of fillers on both ends is equal. For an arbitrary two-dimensional array X, let the element of column J in row I be X[i,j]. When the number of fillers on both ends is equal and the input and output have the same height and width, we know that the output Y[i,j] is obtained by cross-correlation calculation between the window centered on the input X[i,j] and the convolution core.

Multiple Input and Multiple Output Channels

The inputs and outputs used in the first two sections are two-dimensional arrays, but the dimensions of real data are often higher. For example, there are three color channels of RGB (red, green and blue) besides the two dimensions of height and width. Assuming that the height and width of the color image are h and w (pixels), it can be expressed as a 3 * h * W multidimensional array. We call this dimension of size 3 channel dimension. In this section, we will introduce convolution cores with multiple input channels or multiple output channels.
When the input data contains multiple channels, we need to construct a convolution core with the same number of channels as the input data, so as to be able to cross-correlation with the input data with multiple channels. Assuming that the number of channels of input data is ci, the number of channels of convolution core is ci. Let the shape of the convolution core window be kh * kw. When ci=1, we know that the convolution core contains only one two-dimensional array with the shape of kh * kw. When CI > 1, we will assign a core array of kh * kW shape to each input channel. Connecting the CI array on the input channel dimension, a convolution core with the shape of CI * kh * kW is obtained. Since the input and convolution kernels have CI channels, we can do cross-correlation operations on the two-dimensional array of input and the two-dimensional array of convolution kernels in each channel, and then add the two-dimensional output of the cross-correlation operation of Ci by channels to get a two-dimensional array. This is the output of two-dimensional cross-correlation between the input data of multiple channels and the convolution core of multiple input channels.

Figure 5.4 shows an example of two-dimensional cross-correlation calculation with two input channels. On each channel, the two-dimensional input array and the two-dimensional core array do cross-correlation operations, and then add up the channels to get the output. The shaded part in Fig. 5.4 is the first output element and the input and core array elements used for its calculation: (1 *1+2 *2+4 *3+5 *4)+ (0 *0+1 *1+3 *2+4 *3)=56.

2. conv2d() function

Function declaration and python API in TensorFlow

tf.nn.conv2d(
    input,
    filter,
    strides,
    padding,
    use_cudnn_on_gpu=True,
    data_format='NHWC',
    dilations=[1, 1, 1, 1],
    name=None
)
1

2

3

4

5

6

7

8

9

10

input: A Tensor. Must be one of the following types: half, bfloat16,
float32, float64. A 4-D tensor. The dimension order is interpreted
according to the value of data_format, see below for details.

filter: A Tensor. Must have the same type as input. A 4-D tensor of
shape [filter_height, filter_width, in_channels, out_channels]

strides: A list of ints. 1-D tensor of length 4. The stride of the
sliding window for each dimension of input. The dimension order is
determined by the value of data_format, see below for details.

padding: A string from: "SAME", "VALID". The type of padding
algorithm to use.

use_cudnn_on_gpu: An optional bool. Defaults to True.

data_format: An optional string from: "NHWC", "NCHW". Defaults to
"NHWC". Specify the data format of the input and output data. With
the default format "NHWC", the data is stored in the order of:
[batch, height, width, channels]. Alternatively, the format could be
"NCHW", the data storage order of: [batch, channels, height, width].

dilations: An optional list of ints. Defaults to [1, 1, 1, 1]. 1-D
tensor of length 4. The dilation factor for each dimension of input.
If set to k > 1, there will be k-1 skipped cells between each filter
element on that dimension. The dimension order is determined by the value of data_format, see above for details. Dilations in the batch
and depth dimensions must be 1.

name: A name for the operation (optional).

The first four parameters are explained.
//By defaultdata_format="NHWC"At that time,
4D The tensor is in the form of[batch, height, width, channels]，Take image processing as an example: the number of batches of pictures (the number of pictures processed each time), the number of pixels of image height, the number of pixels of picture width, and the number of image channels (color is 3 channels, gray level is 1 channel, and the rest may have depth, etc.).
//here
inputThe four parameters are the above parameters of the input tensor.
filterFour parameters[filter_height, filter_width, in_channels, out_channels]，They are convolution nuclei/Pixel Height, Pixel Width, Number of Input Channels of the Filter input The number of channels is equal, and the number of output channels (number of convolution kernels, number of features of convolution layer learning)
Be careful: A convolution core may have multiple channels, and the number of channels is the same as the number of tensor channels input.
strides:Four-dimensional tensor,[----,Height step, width step.----],The first and fourth numbers are useless, but default is 1
padding:Filling mode:
"SAME": Fill edges with 0 to make the convoluted image size consistent with the original image,

"VALID": Unfilled, right and bottom redundant rows and columns discarded
                                </div>
            <link href="https://csdnimg.cn/release/phoenix/mdeditor/markdown_views-e44c3c0e64.css" rel="stylesheet">
                </div>
</article>

Programmer Group