Before looking at these two functions, we need to understand one-dimensional convolution (conv1d) and two-dimensional convolution (conv2d). Two-dimensional convolution is to operate a feature graph in the direction of width and height by sliding window operation, and the corresponding position is multiplied and summed; while one-dimensional convolution is only to slide window and multiply in the direction of width or height. Summation.
One-dimensional convolution: tf.layers.conv1d()
tf.layers.conv1d( inputs, filters, kernel_size, strides=1, padding='valid', data_format='channels_last', dilation_rate=1, activation=None, use_bias=True, kernel_initializer=None, bias_initializer=tf.zeros_initializer(), kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None, trainable=True, name=None, reuse=None )
Parameters: [1]
- inputs: Tensor data output, generally [batch, width, length]
- Filters: integer, dimension of output space, can be understood as the number of convolution kernels (filters)
- kernel_size: A single integer or tuple/list specifying the length of a 1D (one-dimensional, row or column) convolution window.
- strides: A single integer or tuple/list, specifying the step size of the convolution, defaulting to 1
-
padding: Is "SAME" or "VALID" (case-insensitive) filled with 0?
-
- SAME was filled with 0.
- VALID does not use zero padding, leaving out unmatched superfluous items.
-
- Activation: activation function
- ues_bias: Does this layer use bias?
- kernel_initializer: Initialization of convolution kernels
- bias_initializer: Initiator of bias vector
- kernel_regularizer: Regularization term for convolution kernels
- bias_regularizer: The regularization term of bias
- activity_regularizer: Regularization function of output
- Reuse: Boolean, reuse the weight of the previous layer with the same name
- trainable: Boolean, if True, add variables to the graph collection
-
data_format: A string, a channel_last (default) or channel_first. Sorting of dimensions in input.
-
- channels_last: Input corresponding to shape (batch, length, channels)
- channels_first: Corresponds to the shape input (batch, channels, length)
-
- Name = take a name
Return value:
Tensor after one-dimensional convolution
Example
import tensorflow as tf x = tf.get_variable(name="x", shape=[32, 512, 1024], initializer=tf.zeros_initializer) x = tf.layers.conv1d( x, filters=1, # The third channel of the result is 1 kernel_size=512, # No matter how big it is, it doesn't affect the output. shape strides=1, padding='same', data_format='channels_last', dilation_rate=1, use_bias=True, bias_initializer=tf.zeros_initializer()) print(x) # Tensor("conv1d/BiasAdd:0", shape=(32, 512, 1), dtype=float32)
Analysis:
- The dimension of input data is [batch, data_length, data_width]=[32, 512, 1024]. Generally, the first dimension of input data is batch_size, which means 32 samples. The second dimension and the third dimension represent the length and width of input respectively (512, 1024).
- One-dimensional convolution kernels are two-dimensional, also have length and width. The number of convolution kernels is kernel_size=512. Because the number of convolution kernels is only one, the width of input data is data_width=1024, so the shape of one-dimensional convolution kernels is [512,1024]
- Filters are the number of convolution cores, the third dimension of output data. filteres=1, the third dimension is 1
- So the output data size after convolution is [32, 512, 1]
Two-dimensional convolution: tf.layers.conv2d()
tf.layers.conv2d( inputs, filters, kernel_size, strides=(1, 1), padding='valid', data_format='channels_last', dilation_rate=(1, 1), activation=None, use_bias=True, kernel_initializer=None, bias_initializer=tf.zeros_initializer(), kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None, trainable=True, name=None, reuse=None )
Parameters: [4]
- inputs: Tensor input. Usually [batch, width, length]
- Filters: integer, dimension of output space, can be understood as the number of convolution kernels (filters)
- kernel_size: Two integers or tuples/lists that specify the height and width of a 2D convolution window. It can be a single integer to specify the same values for all spatial dimensions.
- strides: Two integers or tuples/lists that specify the step of the convolution along the height and width directions. It can be a single integer to specify the same values for all spatial dimensions.
-
padding: Is "SAME" or "VALID" (case-insensitive) filled with 0?
-
- SAME was filled with 0.
- VALID does not use zero padding, leaving out unmatched superfluous items.
-
-
data_format: String, "channels_last" (default) or "channels_first". Sorting of dimensions in input.
-
- channels_last: corresponds to input with shape, (batch, height, width, channels)
- channels_first: corresponds to batch, channels, height, width with shapes
-
- Activation: activation function
- use_bias: Boolean, does this layer use bias terms
- kernel_initializer: Initialization of convolution kernels
- bias_initializer: Initialization of bias vectors. If it is None, the default initializer is used
- kernel_regularizer: Regularization term for convolution kernels
- bias_regularizer: Regularization term of bias vector
- activity_regularizer: Regularization function of output
- trainable: Boolean, if True, add variables to the graph collection
- Name: the name of the layer
- Reuse: Boolean, reuse the weight of the previous layer with the same name
Return:
Tensor after two-dimensional convolution
Example:
import tensorflow as tf x = tf.get_variable(name="x", shape=[1, 3, 3, 5], initializer=tf.zeros_initializer) x = tf.layers.conv2d( x, filters=1, # The third channel of the result is 1 kernel_size=[1, 1], # No matter how big it is, it doesn't affect the output. shape strides=[1, 1], padding='same', data_format='channels_last', use_bias=True, bias_initializer=tf.zeros_initializer()) print(x) # shape=(1, 3, 3, 1)
Analysis:
- Input input is a 3*3 image with 5 image channels and input shape=(batch, data_length, data_width, data_channel)
- The kernel_size convolution kernel shape is 1*1, the number of filters is 1 strides step size is [1,1], and the first dimension and the second dimension are length direction and width direction step size = 1, respectively.
- The final output shape is the tensor of [1,3,3,1], i.e. a feature map of 3*3 (batch, length, width, number of output channels)
- Length and width are only related to strides, and the last dimension = filters.
Calculation of Output Size in Convolution Layer
Set the input image size W, the Filter size F, the step size S, the padding size P, and the output image size N:
$$N=\frac{W-F+2P}{S}+1$$
Take the whole down and add 1.
In Tensoflow, Padding has two choices,'SAME'and'VALID'. Here are some examples to illustrate the differences:
If Padding='SAME', the output size is W/S (rounding up)
import tensorflow as tf input_image = tf.get_variable(shape=[64, 32, 32, 3], dtype=tf.float32, name="input", initializer=tf.zeros_initializer) conv0 = tf.layers.conv2d(input_image, 64, kernel_size=[3, 3], strides=[2, 2], padding='same') # 32/2=16 conv1 = tf.layers.conv2d(input_image, 64, kernel_size=[5, 5], strides=[2, 2], padding='same') # kernel_szie No effect on output size print(conv0) # shape=(64, 16, 16, 64) print(conv1) # shape=(64, 16, 16, 64)
If Padding='VALID', the output size is: (W - F + 1) / S
import tensorflow as tf input_image = tf.get_variable(shape=[64, 32, 32, 3], dtype=tf.float32, name="input", initializer=tf.zeros_initializer) conv0 = tf.layers.conv2d(input_image, 64, kernel_size=[3, 3], strides=[2, 2], padding='valid') # (32-3+1)/2=15 conv1 = tf.layers.conv2d(input_image, 64, kernel_size=[5, 5], strides=[2, 2], padding='valid') # (32-5+1)/2=14 print(conv0) # shape=(64, 15, 15, 64) print(conv1) # shape=(64, 14, 14, 64)
Reference:
[1] tensorflow official API tf.layers.conv1d
[2] Analysis of tf.layers.conv1d function (one-dimensional convolution)