Firstly, a multi-layer perceptron with a single hidden layer is defined, its parameters are initialized by the default method, and a forward operation is performed.
import torch from torch import nn from torch.nn import init net=nn.Sequential(nn.Linear(4,3),nn.ReLU(),nn.Linear(3,1)) #Pytoch will be initialized by default print(net) X=torch.rand(2,4) Y=net(X).sum()
Sequential( (0): Linear(in_features=4, out_features=3, bias=True) (1): ReLU() (2): Linear(in_features=3, out_features=1, bias=True) )
1, Access model parameters
For the layer with model parameters in the Sequential example, we can use parameters() or named of the Module class_ The parameters() method to access all parameters (returned as iterators), which returns their names in addition to the parameter Tensor. The following code accesses all parameters of net above:
for name,param in net.named_parameters(): print(name,param.size())
0.weight torch.Size([3, 4]) 0.bias torch.Size() 2.weight torch.Size([1, 3]) 2.bias torch.Size()
It can be seen that the returned name is automatically prefixed with the index of the number of layers.
Next, access the parameters in net. For neural networks constructed using Sequential class, we can access any layer of the network through square brackets .
for name,param in net.named_parameters(): print(name,param.size(),type(param))
weight torch.Size([3, 4]) bias torch.Size()
Because this is one-dimensional, there is no prefix for the layer index. In addition, the type of the returned Parameter is torch.nn.parameter.Parameter. In fact, this is a subclass of Tensor. Unlike Tensor, if a Tensor is a Parameter, it will be automatically added to the Parameter list of the model.
Because Parameter is Tensor, that is, it has all the attributes owned by Tensor, we can use data to access the Parameter value and grad to access the Parameter gradient.
weight_0=list(net.parameters()) print(weight_0.data) #The gradient is None before there is no back propagation print(weight_0.grad) Y.backward() print(weight_0.grad)
tensor([[ 0.0285, 0.4555, 0.3370, 0.2170], [ 0.2281, 0.0616, -0.4615, -0.1053], [-0.2828, -0.4555, -0.4292, 0.4989]]) None tensor([[0.1976, 0.3890, 0.3261, 0.5621], [0.0000, 0.0000, 0.0000, 0.0000], [0.0000, 0.0000, 0.0000, 0.0000]])
2, Initialize model parameters
The module parameters in nn.Moudule in Pytorch adopt a more reasonable initialization strategy (refer to the source code for the specific initialization method adopted by different types of layer s). But we often need to use other methods to initialize weights. The init module of Pytorch provides a variety of preset initialization methods.
In the following example, we initialize the weight to a normally distributed random number with a mean of 0 and a standard deviation of 0.01, and clear the deviation parameter.
for name,param in net.named_parameters(): if 'weight' in name: torch.nn.init.normal_(param,mean=0,std=0.01) print(name,param.data)
0.weight tensor([[ 0.0098, 0.0141, -0.0097, 0.0053], [ 0.0224, -0.0045, 0.0074, 0.0173], [-0.0044, 0.0200, 0.0056, -0.0107]]) 2.weight tensor([[ 0.0119, -0.0109, -0.0076]])
We can also use constants to initialize weight parameters:
#Replace the initialization function above torch.nn.init.constant_(param,val=0)
If we only want to initialize a specific Parameter, we can call the initialize function of the Parameter class, which is consistent with the use method of the initialize function provided by the Block class.
3, Custom initialization method
To be supplemented 110