Previous chapter We have learned about automatic gradient autograd. torch.nn can be used to build neural network in pytorch. NN depends on autograd to define the model and differentiate it. nn.Module contains the layer and the method forward(input) that returns output.
Artificial Neural Networks (abbreviated as ANNs), also referred to as neural networks (NNs) or Connection Model, is an algorithmic mathematical model for distributed parallel information processing by imitating the behavior characteristics of animal neural networks. This kind of network relies on the complexity of the system and achieves the purpose of processing information by adjusting the interconnected relationship between a large number of internal nodes.
The typical training process of neural network is as follows:
- A neural network with some learnable parameters (or weights) is defined
- Traverse the input data set
- Process input through network
- Calculate the loss (how far is the correct distance output)
- Propagate the gradient back to the network parameters
- A simple update rule is usually used to update the weight of the network: weight = weight - learning_rate * gradient
1. Define network
1.1 custom network
Customize the following networks:
import torch import torch.nn as nn import torch.nn.functional as F class Net(nn.Module): def __init__(self): super(Net, self).__init__() # 1 input image channel, 6 output channels, 3x3 square convolution # kernel self.conv1 = nn.Conv2d(1, 6, 3) self.conv2 = nn.Conv2d(6, 16, 3) # an affine operation: y = Wx + b self.fc1 = nn.Linear(16 * 6 * 6, 120) # 6*6 from image dimension self.fc2 = nn.Linear(120, 84) self.fc3 = nn.Linear(84, 10) def forward(self, x): # Max pooling over a (2, 2) window x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2)) # If the size is a square you can only specify a single number x = F.max_pool2d(F.relu(self.conv2(x)), 2) x = x.view(-1, self.num_flat_features(x)) x = F.relu(self.fc1(x)) x = F.relu(self.fc2(x)) x = self.fc3(x) return x def num_flat_features(self, x): size = x.size()[1:] # all dimensions except the batch dimension num_features = 1 for s in size: num_features *= s return num_features net = Net() print(net)
Output:
Net( (conv1): Conv2d(1, 6, kernel_size=(3, 3), stride=(1, 1)) (conv2): Conv2d(6, 16, kernel_size=(3, 3), stride=(1, 1)) (fc1): Linear(in_features=576, out_features=120, bias=True) (fc2): Linear(in_features=120, out_features=84, bias=True) (fc3): Linear(in_features=84, out_features=10, bias=True) )
1.2 automatic gradients using custom networks
Once the forward function is defined, you can use autograd's custom backward function (calculate gradient). For example, output the learning parameters of the model.
params = list(net.parameters()) print(len(params)) print(params[0].size()) # conv1's .weight
Output:
10 torch.Size([6, 1, 3, 3])
1.3 test network
- test
Let's try a 32x32 random input.
input = torch.randn(1, 1, 32, 32) out = net(input) print(out)
Output:
tensor([[ 0.1002, -0.0694, -0.0436, 0.0103, 0.0488, -0.0429, -0.0941, -0.0146, -0.0031, -0.0923]], grad_fn=<AddmmBackward>)
- Parameter and back propagation gradient buffer zeroing
net.zero_grad() out.backward(torch.randn(1, 10))
be careful:
- touch.nn only supports small batches. The entire torch.nn package only supports input as a micro sample rather than a single sample. For example, nn.Conv2d will take the 4D tensor of nSamples x nChannels x Height x Width. If you have only one sample, just use input.unsqueeze(0) to add a dummy batch size.
1.4 review
- torch.Tensor: a multidimensional array that supports automatic differentiation operations such as backward(). Similarly, the gradient relative to the tensor is maintained.
- nn.Module: neural network module. A convenient method to encapsulate parameters, with helpers to move them to GPU, export, load, etc.
- nn.Parameter: a tensor that is automatically registered as a parameter when it is assigned as an attribute of the Module.
- autograd.Function: realize the forward and reverse definition of automatic differential operation. Each Tensor operation creates at least one function node, which is connected to the function that created Tensor and encodes its history.
2. Loss function
The loss function takes a pair of (output, target) inputs and calculates a value that estimates the distance between the output and the target. There are several different under nn package loss function . A simple loss is NN. Mselos, which calculates the mean square error between the input and the target.
Use your own defined network to calculate the loss function.
output = net(input) #Use your own defined network target = torch.randn(10) # a dummy target, for example, a set of hypothetical data target = target.view(1, -1) # make it the same shape as output criterion = nn.MSELoss() loss = criterion(output, target) print(loss)
Output:
tensor(0.7870, grad_fn=<MseLossBackward>)
3. Back propagation
To back propagate the error, all we have to do is correct loss.backward(). However, you need to clear the existing gradient, otherwise the gradient will accumulate into the existing gradient.
net.zero_grad() # zeroes the gradient buffers of all parameters print('conv1.bias.grad before backward') print(net.conv1.bias.grad) loss.backward() print('conv1.bias.grad after backward') print(net.conv1.bias.grad)
Output:
conv1.bias.grad before backward None conv1.bias.grad after backward tensor([-0.0341, -0.0014, 0.0153, 0.0203, -0.0092, 0.0030])
4. Update weight
The simplest update rule used in practice is random gradient descent (SGD): weight = weight - learning_rate * gradient
We can do this with simple Python code:
learning_rate = 0.01 for f in net.parameters(): f.data.sub_(f.grad.data * learning_rate)
However, when using neural networks, you want to use different update rules, such as SGD, nesterov SGD, Adam, RMSProp, etc. To achieve this, we built a small wrapper: torch.optim, which implements all these methods. It's easy to use:
import torch.optim as optim # create your optimizer optimizer = optim.SGD(net.parameters(), lr=0.01) # in your training loop: optimizer.zero_grad() # zero the gradient buffers output = net(input) loss = criterion(output, target) loss.backward() optimizer.step() # Does the update
Note that you need to manually clear the gradient buffer: optimizer.zero_grad().