Convolution Operation of CNN Neural Network

Keywords: Python

Before looking at these two functions, we need to understand one-dimensional convolution (conv1d) and two-dimensional convolution (conv2d). Two-dimensional convolution is to operate a feature graph in the direction of width and height by sliding window operation, and the corresponding position is multiplied and summed; while one-dimensional convolution is only to slide window and multiply in the direction of width or height. Summation.

One-dimensional convolution: tf.layers.conv1d()

tf.layers.conv1d(
    inputs,
    filters,
    kernel_size,
    strides=1,
    padding='valid',
    data_format='channels_last',
    dilation_rate=1,
    activation=None,
    use_bias=True,
    kernel_initializer=None,
    bias_initializer=tf.zeros_initializer(),
    kernel_regularizer=None,
    bias_regularizer=None,
    activity_regularizer=None,
    kernel_constraint=None,
    bias_constraint=None,
    trainable=True,
    name=None,
    reuse=None
)

Parameters: [1]

inputs: Tensor data output, generally [batch, width, length]
Filters: integer, dimension of output space, can be understood as the number of convolution kernels (filters)
kernel_size: A single integer or tuple/list specifying the length of a 1D (one-dimensional, row or column) convolution window.
strides: A single integer or tuple/list, specifying the step size of the convolution, defaulting to 1
padding: Is "SAME" or "VALID" (case-insensitive) filled with 0?
- - SAME was filled with 0.
  - VALID does not use zero padding, leaving out unmatched superfluous items.
Activation: activation function
ues_bias: Does this layer use bias?
kernel_initializer: Initialization of convolution kernels
bias_initializer: Initiator of bias vector
kernel_regularizer: Regularization term for convolution kernels
bias_regularizer: The regularization term of bias
activity_regularizer: Regularization function of output
Reuse: Boolean, reuse the weight of the previous layer with the same name
trainable: Boolean, if True, add variables to the graph collection
data_format: A string, a channel_last (default) or channel_first. Sorting of dimensions in input.
- - channels_last: Input corresponding to shape (batch, length, channels)
  - channels_first: Corresponds to the shape input (batch, channels, length)
Name = take a name

Return value:

Tensor after one-dimensional convolution

Example

import tensorflow as tf 

x = tf.get_variable(name="x", shape=[32, 512, 1024], initializer=tf.zeros_initializer)
x = tf.layers.conv1d(
    x,
    filters=1,                    # The third channel of the result is 1
    kernel_size=512,            # No matter how big it is, it doesn't affect the output. shape
    strides=1,
    padding='same',
    data_format='channels_last',
    dilation_rate=1,
    use_bias=True,
    bias_initializer=tf.zeros_initializer())

print(x)            # Tensor("conv1d/BiasAdd:0", shape=(32, 512, 1), dtype=float32)

Analysis:

The dimension of input data is [batch, data_length, data_width]=[32, 512, 1024]. Generally, the first dimension of input data is batch_size, which means 32 samples. The second dimension and the third dimension represent the length and width of input respectively (512, 1024).
One-dimensional convolution kernels are two-dimensional, also have length and width. The number of convolution kernels is kernel_size=512. Because the number of convolution kernels is only one, the width of input data is data_width=1024, so the shape of one-dimensional convolution kernels is [512,1024]
Filters are the number of convolution cores, the third dimension of output data. filteres=1, the third dimension is 1
So the output data size after convolution is [32, 512, 1]

Two-dimensional convolution: tf.layers.conv2d()

tf.layers.conv2d(
    inputs,
    filters,
    kernel_size,
    strides=(1, 1),
    padding='valid',
    data_format='channels_last',
    dilation_rate=(1, 1),
    activation=None,
    use_bias=True,
    kernel_initializer=None,
    bias_initializer=tf.zeros_initializer(),
    kernel_regularizer=None,
    bias_regularizer=None,
    activity_regularizer=None,
    kernel_constraint=None,
    bias_constraint=None,
    trainable=True,
    name=None,
    reuse=None
)

Parameters: [4]

inputs: Tensor input. Usually [batch, width, length]
Filters: integer, dimension of output space, can be understood as the number of convolution kernels (filters)
kernel_size: Two integers or tuples/lists that specify the height and width of a 2D convolution window. It can be a single integer to specify the same values for all spatial dimensions.
strides: Two integers or tuples/lists that specify the step of the convolution along the height and width directions. It can be a single integer to specify the same values for all spatial dimensions.
padding: Is "SAME" or "VALID" (case-insensitive) filled with 0?
- - SAME was filled with 0.
  - VALID does not use zero padding, leaving out unmatched superfluous items.
data_format: String, "channels_last" (default) or "channels_first". Sorting of dimensions in input.
- - channels_last: corresponds to input with shape, (batch, height, width, channels)
  - channels_first: corresponds to batch, channels, height, width with shapes
Activation: activation function
use_bias: Boolean, does this layer use bias terms
kernel_initializer: Initialization of convolution kernels
bias_initializer: Initialization of bias vectors. If it is None, the default initializer is used
kernel_regularizer: Regularization term for convolution kernels
bias_regularizer: Regularization term of bias vector
activity_regularizer: Regularization function of output
trainable: Boolean, if True, add variables to the graph collection
Name: the name of the layer
Reuse: Boolean, reuse the weight of the previous layer with the same name

Return:

Tensor after two-dimensional convolution

Example:

import tensorflow as tf 

x = tf.get_variable(name="x", shape=[1, 3, 3, 5], initializer=tf.zeros_initializer)
x = tf.layers.conv2d(
    x,
    filters=1,                    # The third channel of the result is 1
    kernel_size=[1, 1],            # No matter how big it is, it doesn't affect the output. shape
    strides=[1, 1],
    padding='same',
    data_format='channels_last',
    use_bias=True,
    bias_initializer=tf.zeros_initializer())

print(x)            # shape=(1, 3, 3, 1)

Analysis:

Input input is a 3*3 image with 5 image channels and input shape=(batch, data_length, data_width, data_channel)
The kernel_size convolution kernel shape is 1*1, the number of filters is 1 strides step size is [1,1], and the first dimension and the second dimension are length direction and width direction step size = 1, respectively.
The final output shape is the tensor of [1,3,3,1], i.e. a feature map of 3*3 (batch, length, width, number of output channels)
Length and width are only related to strides, and the last dimension = filters.

Calculation of Output Size in Convolution Layer

Set the input image size W, the Filter size F, the step size S, the padding size P, and the output image size N:

$$N=\frac{W-F+2P}{S}+1$$

Take the whole down and add 1.

In Tensoflow, Padding has two choices,'SAME'and'VALID'. Here are some examples to illustrate the differences:

If Padding='SAME', the output size is W/S (rounding up)

import tensorflow as tf

input_image = tf.get_variable(shape=[64, 32, 32, 3], dtype=tf.float32, name="input", initializer=tf.zeros_initializer)
conv0 = tf.layers.conv2d(input_image, 64, kernel_size=[3, 3], strides=[2, 2], padding='same')  # 32/2=16
conv1 = tf.layers.conv2d(input_image, 64, kernel_size=[5, 5], strides=[2, 2], padding='same')  
# kernel_szie No effect on output size
print(conv0)      # shape=(64, 16, 16, 64)
print(conv1)      # shape=(64, 16, 16, 64)

If Padding='VALID', the output size is: (W - F + 1) / S

import tensorflow as tf

input_image = tf.get_variable(shape=[64, 32, 32, 3], dtype=tf.float32, name="input", initializer=tf.zeros_initializer)
conv0 = tf.layers.conv2d(input_image, 64, kernel_size=[3, 3], strides=[2, 2], padding='valid')  # (32-3+1)/2=15
conv1 = tf.layers.conv2d(input_image, 64, kernel_size=[5, 5], strides=[2, 2], padding='valid')  # (32-5+1)/2=14
print(conv0)      # shape=(64, 15, 15, 64)
print(conv1)      # shape=(64, 14, 14, 64)

Reference:

[1] tensorflow official API tf.layers.conv1d

[2] Analysis of tf.layers.conv1d function (one-dimensional convolution)

[3] tf.layer.conv1d,conv2d,conv3d

[4] tensorflow official API tf.layers.conv2d

Posted by Niel Roos on Fri, 19 Jul 2019 02:42:25 -0700

Programmer Group