Code source: https://github.com/eriklindernoren/ML-From-Scratch
The specific implementation of convolution layer Conv2D (with strip and padding) in convolution neural network: https://www.cnblogs.com/xiximayou/p/12706576.html
Implementation of activation function (sigmoid, softmax, tanh, relu, leakyrelu, elu, selu, softplus): https://www.cnblogs.com/xiximayou/p/12713081.html
Definition of loss function (mean square error, cross entropy loss): https://www.cnblogs.com/xiximayou/p/12713198.html
Implementation of optimizer (SGD, Nesterov, Adagrad, Adadelta, RMSprop, Adam): https://www.cnblogs.com/xiximayou/p/12713594.html
This section will continue to learn the back propagation process of volume accumulation layer according to the code.
Here, only the forward and backward code of Conv2D is posted:
def forward_pass(self, X, training=True): batch_size, channels, height, width = X.shape self.layer_input = X # Turn image shape into column shape # (enables dot product between input and weights) self.X_col = image_to_column(X, self.filter_shape, stride=self.stride, output_shape=self.padding) # Turn weights into column shape self.W_col = self.W.reshape((self.n_filters, -1)) # Calculate output output = self.W_col.dot(self.X_col) + self.w0 # Reshape into (n_filters, out_height, out_width, batch_size) output = output.reshape(self.output_shape() + (batch_size, )) # Redistribute axises so that batch size comes first return output.transpose(3,0,1,2) def backward_pass(self, accum_grad): # Reshape accumulated gradient into column shape accum_grad = accum_grad.transpose(1, 2, 3, 0).reshape(self.n_filters, -1) if self.trainable: # Take dot product between column shaped accum. gradient and column shape # layer input to determine the gradient at the layer with respect to layer weights grad_w = accum_grad.dot(self.X_col.T).reshape(self.W.shape) # The gradient with respect to bias terms is the sum similarly to in Dense layer grad_w0 = np.sum(accum_grad, axis=1, keepdims=True) # Update the layers weights self.W = self.W_opt.update(self.W, grad_w) self.w0 = self.w0_opt.update(self.w0, grad_w0) # Recalculate the gradient which will be propogated back to prev. layer accum_grad = self.W_col.T.dot(accum_grad) # Reshape from column shape to image shape accum_grad = column_to_image(accum_grad, self.layer_input.shape, self.filter_shape, stride=self.stride, output_shape=self.padding) return accum_grad
In the definition of convolutional neural network, it's in neural network.py
def train_on_batch(self, X, y): """ Single gradient update over one batch of samples """ y_pred = self._forward_pass(X) loss = np.mean(self.loss_function.loss(y, y_pred)) acc = self.loss_function.acc(y, y_pred) # Calculate the gradient of the loss function wrt y_pred loss_grad = self.loss_function.gradient(y, y_pred) # Backpropagate. Update weights self._backward_pass(loss_grad=loss_grad) return loss, acc
You also need to look at self. Forward. PAS and self. Backward. Pass:
def _forward_pass(self, X, training=True): """ Calculate the output of the NN """ layer_output = X for layer in self.layers: layer_output = layer.forward_pass(layer_output, training) return layer_output def _backward_pass(self, loss_grad): """ Propagate the gradient 'backwards' and update the weights in each layer """ for layer in reversed(self.layers): loss_grad = layer.backward_pass(loss_grad)
We can see that in forward propagation, the output of each layer in self.layers is calculated, including convolution, pooling, activation and normalization. Then the gradient of each layer is updated from back to front in back propagation. Here we take a convolution layer + full connection layer + loss function as an example. After the forward propagation of the network, the first gradient obtained is the gradient of the loss function. Then the gradient of the loss function is passed into the full connection layer, and then the gradient calculated by the full connection layer is obtained, which is passed into the convolution layer. At this time, the backward ﹐ pass() method of the convolution layer is called. In the backward ﹐ pass() method in the volume accumulation layer, if self.trainable is set, the accounting calculates the gradient of weight W and offset item W0, and then uses the optimizer optmizer, that is, w ﹐ opt and w0 ﹐ opt to update the parameters, and then calculates the gradient of the previous layer. Finally, there is a colun_to_image() method.
def column_to_image(cols, images_shape, filter_shape, stride, output_shape='same'): batch_size, channels, height, width = images_shape pad_h, pad_w = determine_padding(filter_shape, output_shape) height_padded = height + np.sum(pad_h) width_padded = width + np.sum(pad_w) images_padded = np.empty((batch_size, channels, height_padded, width_padded)) # Calculate the indices where the dot products are applied between weights # and the image k, i, j = get_im2col_indices(images_shape, filter_shape, (pad_h, pad_w), stride) cols = cols.reshape(channels * np.prod(filter_shape), -1, batch_size) cols = cols.transpose(2, 0, 1) # Add column content to the images at the indices np.add.at(images_padded, (slice(None), k, i, j), cols) # Return image without padding return images_padded[:, :, pad_h[0]:height+pad_h[0], pad_w[0]:width+pad_w[0]]
This method is to restore the image to column () to the image padded format.
For example, the transformation of various shapes during the calculation is quite a headache, and there will be a variety of functions in numpy that need to be consulted. As long as we understand the general process, we can deepen the understanding of relevant knowledge.