[python implements convolution neural network] convolution layer Conv2D implementation (with stride, padding)

Keywords: Python network github less

[python implements convolution neural network] convolution layer Conv2D implementation (with stride, padding)

Needless to say more about how convolution is done, consider how convolution layers are implemented step by step with your code.

Code Source: https://github.com/eriklindernoren/ML-From-Scratch

First, take a look at its basic component functions, starting with determine_padding(filter_shape, output_shape="same"):

def determine_padding(filter_shape, output_shape="same"):

# No padding
if output_shape == "valid":
    return (0, 0), (0, 0)
# Pad so that the output shape is the same as input shape (given that stride=1)
elif output_shape == "same":
    filter_height, filter_width = filter_shape

    # Derived from:
    # output_height = (height + pad_h - filter_height) / stride + 1
    # In this case output_height = height and stride = 1. This gives the
    # expression for the padding below.
    pad_h1 = int(math.floor((filter_height - 1)/2))
    pad_h2 = int(math.ceil((filter_height - 1)/2))
    pad_w1 = int(math.floor((filter_width - 1)/2))
    pad_w2 = int(math.ceil((filter_width - 1)/2))

    return (pad_h1, pad_h2), (pad_w1, pad_w2)

Description: The padding values are calculated according to the shape of the convolution kernel and the padding method, including up, down, left and right, where out_shape=valid means no padding.

Supplement:

math.floor(x) returns the largest integer less than or equal to X.
math.ceil(x) returns the largest integer greater than or equal to X.
Take in the actual parameters to see the output:

pad_h,pad_w=determine_padding((3,3), output_shape="same")
Output: (1,1), (1,1)

Then the image_to_column(images, filter_shape, stride, output_shape='same') function

def image_to_column(images, filter_shape, stride, output_shape='same'):

filter_height, filter_width = filter_shape
pad_h, pad_w = determine_padding(filter_shape, output_shape)# Add padding to the image
images_padded = np.pad(images, ((0, 0), (0, 0), pad_h, pad_w), mode='constant')# Calculate the indices where the dot products are to be applied between weights
# and the image
k, i, j = get_im2col_indices(images.shape, filter_shape, (pad_h, pad_w), stride)

# Get content from image at those indices
cols = images_padded[:, k, i, j]
channels = images.shape[1]
# Reshape content into column shape
cols = cols.transpose(1, 2, 0).reshape(filter_height * filter_width * channels, -1)
return cols

Description: The shape of the input images is [batchsize,channel,height,width], similar to the input in pytorch's image format.That is, images_padded is padding on heights and widths.The get_im2col_indices() function is called, so let's see what it looks like next:

def get_im2col_indices(images_shape, filter_shape, padding, stride=1):

# First figure out what the size of the output should be
batch_size, channels, height, width = images_shape
filter_height, filter_width = filter_shape
pad_h, pad_w = padding
out_height = int((height + np.sum(pad_h) - filter_height) / stride + 1)
out_width = int((width + np.sum(pad_w) - filter_width) / stride + 1)

i0 = np.repeat(np.arange(filter_height), filter_width)
i0 = np.tile(i0, channels)
i1 = stride * np.repeat(np.arange(out_height), out_width)
j0 = np.tile(np.arange(filter_width), filter_height * channels)
j1 = stride * np.tile(np.arange(out_width), out_height)
i = i0.reshape(-1, 1) + i1.reshape(1, -1)
j = j0.reshape(-1, 1) + j1.reshape(1, -1)
k = np.repeat(np.arange(channels), filter_height * filter_width).reshape(-1, 1)return (k, i, j)

Note: It's hard to understand by looking at it alone, but we still look at it step by step with the actual parameters.

get_im2col_indices((1,3,32,32), (3,3), ((1,1),(1,1)), stride=1)
Note: Looking at the variation of each variable, out_width and out_height are not to mention more, but the width and high dimensions of the output signature after convolution.

i0: np.repeat(np.arange(3),3): [0 ,0,0,1,1,1,2,2,2]
i0:np.tile ([0,0,0,1,1,2,2],3): [0,0,0,1,1,1,1,1,2,2,2,0,0,0,0,1,1,1,2,2,2,0,0,1,1,2,2], size: (27,)
i1:1*np.repeat(np.arange(32),32): [0,0,0..., 31,31,31], size: (1024,)
j0:np.tile(np.arange(3),3*3): [0,1,2,0,1,2,...], size: (27,)
j1:1*np.tile(np.arange(32),32): [0,1,2,3,..., 0,1,2,..., 29,30,31], size (1024,)
I:i0.reshape(-1,1)+i1.reshape(1,-1): size (27,1024)
J:j0.reshape(-1,1)+j1.reshape(1,-1): size (27,1024)
k:np.repeat(np.arange(3),3*3).reshape(-1,1): size (27,1)
Supplement:

numpy.pad(array, pad_width, mode, **kwargs):array is the data to be filled, the second parameter specifies the length of the fill, mod specifies the data to be filled, defaulting to 0, and if constant, the value to be filled.
numpy.arange(start, stop, step, dtype = None): example numpy.arange(3), output [0, 1, 2]
numpy.repeat(array,repeats,axis=None): exemplify numpy.repeat([0,1,2],3), output: [0,0,0,1,1,2,2]
numpy.tile(array,reps): Example numpy.tile([0,1,2],3), output: [0,1,2,0,1,2,0,1,2]
Specific and more complex uses still have to be looked up.Only those related to this code are listed here.
It's hard to understand with these sizes.So let's go ahead and make it clear that k operates on channels, i is the height of the signature map, and j is the width of the signature map.The convolution kernel of 3*3 is used to convolute on one channel, performing 3*3=9 pixels operation at a time, totaling 3 channels, so 9*3=27 pixel points are operated on.The image size is 32 by 32, totaling 1024 pixels.Go back to these three lines of code:

cols = images_padded[:, k, i, j]
channels = images.shape[1]
# Reshape content into column shape
cols = cols.transpose(1, 2, 0).reshape(filter_height * filter_width * channels, -1)

If the size of images_padded is (1,3,34,34), then cols=images_padded is (1,27,1024)

channels are 3 in size

The final cols=cols.transpose(1,2,0).reshape(333,-1) is of size (27,1024).

When the size of the batchsize is not 1, assuming 64, then the final output cols is (27,1024*64) = (27,65536).

Finally, the convolution layer is implemented:

First there is a Layer generic base class that can be inherited to implement different layers, such as convolution, pooling, batch normalization, and so on:

class Layer(object):

def set_input_shape(self, shape):
    """ Sets the shape that the layer expects of the input in the forward
    pass method """
    self.input_shape = shape

def layer_name(self):
    """ The name of the layer. Used in model summary. """
    return self.__class__.__name__

def parameters(self):
    """ The number of trainable parameters used by the layer """
    return 0

def forward_pass(self, X, training):
    """ Propogates the signal forward in the network """
    raise NotImplementedError()

def backward_pass(self, accum_grad):
    """ Propogates the accumulated gradient backwards in the network.
    If the has trainable weights then these weights are also tuned in this method.
    As input (accum_grad) it receives the gradient with respect to the output of the layer and
    returns the gradient with respect to the output of the previous layer. """
    raise NotImplementedError()

def output_shape(self):
    """ The shape of the output produced by forward_pass """
    raise NotImplementedError()

If no implementation throws an exception using raise NotImplementedError() for the method that the subclass must implement to inherit from the base class.

Then Conv2D can be implemented based on this base class:

class Conv2D(Layer):

"""A 2D Convolution Layer.
Parameters:
-----------
n_filters: int
    The number of filters that will convolve over the input matrix. The number of channels
    of the output shape.
filter_shape: tuple
    A tuple (filter_height, filter_width).
input_shape: tuple
    The shape of the expected input of the layer. (batch_size, channels, height, width)
    Only needs to be specified for first layer in the network.
padding: string
    Either 'same' or 'valid'. 'same' results in padding being added so that the output height and width
    matches the input height and width. For 'valid' no padding is added.
stride: int
    The stride length of the filters during the convolution over the input.
"""
def __init__(self, n_filters, filter_shape, input_shape=None, padding='same', stride=1):
    self.n_filters = n_filters
    self.filter_shape = filter_shape
    self.padding = padding
    self.stride = stride
    self.input_shape = input_shape
    self.trainable = True

def initialize(self, optimizer):
    # Initialize the weights
    filter_height, filter_width = self.filter_shape
    channels = self.input_shape[0]
    limit = 1 / math.sqrt(np.prod(self.filter_shape))
    self.W  = np.random.uniform(-limit, limit, size=(self.n_filters, channels, filter_height, filter_width))
    self.w0 = np.zeros((self.n_filters, 1))
    # Weight optimizers
    self.W_opt  = copy.copy(optimizer)
    self.w0_opt = copy.copy(optimizer)

def parameters(self):
    return np.prod(self.W.shape) + np.prod(self.w0.shape)

def forward_pass(self, X, training=True):
    batch_size, channels, height, width = X.shape
    self.layer_input = X
    # Turn image shape into column shape
    # (enables dot product between input and weights)
    self.X_col = image_to_column(X, self.filter_shape, stride=self.stride, output_shape=self.padding)
    # Turn weights into column shape
    self.W_col = self.W.reshape((self.n_filters, -1))
    # Calculate output
    output = self.W_col.dot(self.X_col) + self.w0
    # Reshape into (n_filters, out_height, out_width, batch_size)
    output = output.reshape(self.output_shape() + (batch_size, ))
    # Redistribute axises so that batch size comes first
    return output.transpose(3,0,1,2)

def backward_pass(self, accum_grad):
    # Reshape accumulated gradient into column shape
    accum_grad = accum_grad.transpose(1, 2, 3, 0).reshape(self.n_filters, -1)

    if self.trainable:
        # Take dot product between column shaped accum. gradient and column shape
        # layer input to determine the gradient at the layer with respect to layer weights
        grad_w = accum_grad.dot(self.X_col.T).reshape(self.W.shape)
        # The gradient with respect to bias terms is the sum similarly to in Dense layer
        grad_w0 = np.sum(accum_grad, axis=1, keepdims=True)

        # Update the layers weights
        self.W = self.W_opt.update(self.W, grad_w)
        self.w0 = self.w0_opt.update(self.w0, grad_w0)

    # Recalculate the gradient which will be propogated back to prev. layer
    accum_grad = self.W_col.T.dot(accum_grad)
    # Reshape from column shape to image shape
    accum_grad = column_to_image(accum_grad,
                            self.layer_input.shape,
                            self.filter_shape,
                            stride=self.stride,
                            output_shape=self.padding)

    return accum_grad

def output_shape(self):
    channels, height, width = self.input_shape
    pad_h, pad_w = determine_padding(self.filter_shape, output_shape=self.padding)
    output_height = (height + np.sum(pad_h) - self.filter_shape[0]) / self.stride + 1
    output_width = (width + np.sum(pad_w) - self.filter_shape[1]) / self.stride + 1
    return self.n_filters, int(output_height), int(output_width)

Assuming that the input or (1,3,32,32) dimensions are convoluted using 16 3*3 convolution cores, the size of self.W is (16,3,3,3) and the size of self.w0 is (16,1).

The size of self.X_col is (27,1024) and self.W_col is (16,27), so output = self.W_col.dot(self.X_col) + self.w0 is (16,1024)

The last thing to do is:

image = np.random.randint(0,255,size=(1,3,32,32)).astype(np.uint8)
input_shape=image.squeeze().shape
conv2d = Conv2D(16, (3,3), input_shape=input_shape, padding='same', stride=1)
conv2d.initialize(None)
output=conv2d.forward_pass(image,training=True)
print(output.shape)
Output results: (1,16,32,32)

Calculate the following parameters:

print(conv2d.parameters())
Output result: 448

That is 448=3*3*3*16+16

And then a padding=valid:

image = np.random.randint(0,255,size=(1,3,32,32)).astype(np.uint8)
input_shape=image.squeeze().shape
conv2d = Conv2D(16, (3,3), input_shape=input_shape, padding='valid', stride=1)
conv2d.initialize(None)
output=conv2d.forward_pass(image,training=True)
print(output.shape)
print(conv2d.parameters())

It is important to note that the size of the cols has changed because our output after convolution is (1,16,30,30)

Output:

Size of cols: (27,900)

(1,16,30,30)

448

Finally, with a step:

image = np.random.randint(0,255,size=(1,3,32,32)).astype(np.uint8)
input_shape=image.squeeze().shape
conv2d = Conv2D(16, (3,3), input_shape=input_shape, padding='valid', stride=2)
conv2d.initialize(None)
output=conv2d.forward_pass(image,training=True)
print(output.shape)
print(conv2d.parameters())

Size of cols: (27,225)

(1,16,15,15)

448

Finally, add:

Formula for calculating convolution layer parameters: params = convolution core height * convolution core width * number of channels * number of convolution cores + offset item (number of convolution cores)

Formula for calculating image size after convolution:

Output image height=(input image height+padding(high)*2-convolution kernel height)/step+1

Width of output image = (width of input image + padding * 2-convolution kernel width) / step + 1

The operation of transformations in the get_im2col_indices() function is clear, and the reasons for such transformations need to be carefully considered.Research on reverse propagation and optimization optimizer was done before they were updated.

Original Address https://www.cnblogs.com/xiximayou/p/12706576.html

Posted by rupam_jaiswal on Wed, 15 Apr 2020 20:43:00 -0700