Firstly, the cross entropy loss function of torch is called as follows:
torch.nn.functional.cross_entropy(input, target, weight=None, size_average=None, ignore_index=-100, reduce=None, reduction='mean')
It is usually written as:
import torch.nn.functional as F F.cross_entropy(input, target)
2. Parameter description
Input( tensor )–(N, C)， Where C = number of categories; Or in the case of 2D loss, enter the dimension (N, C, H, W) ， Or at K ≥ 1 In the case of k-dimensional loss, the input dimension is (N, C, d1, d2, ..., dK) .
target(tensor )-(N) Where each value is 0 ≤ target[i] ≤ C-1, or K≥1 For k-dimensional loss, the size of the target tensor is (n, D1, D2,..., DK).
weight ( Tensor , optional ) – Manually rescale weights for each category. If given, it must be a tensor of size C
size_average ( bool , optional ) – Not recommended. By default, the loss is the average of each loss element in the batch. Note that for some losses, each sample has multiple elements. If the field size_average Set to False to sum the losses of each small batch. Ignore False when reduce is. Default: True
ignore_index ( int , optional ) – Specify a target value that is ignored and does not contribute to the input gradient. When size_average is True, the loss is averaged over targets that are not ignored. Default: - 100
reduce ( bool , optional ) – Not recommended. By default, losses are averaged or summed for each small batch of observations, depending on size_average. When reduceis is False, the loss of each batch element is returned and the size is ignored_ average. Default: True
reduce ( string , optional ) – Specify the reduction applied to the output: ' none'| ' mean'| ' sum'. ' None ': no reduction will be applied, ' Mean ': the sum of the outputs will be divided by the number of elements in the output, ' Sum ': the output will be summed. Note: size_average And reduce are being deprecated. At the same time, specifying either of these two parameters will override reduction Default: 'mean'
import torch import torch.nn.functional as F input = torch.randn(3, 5, requires_grad=True) target = torch.randint(5, (3,), dtype=torch.int64) loss = F.cross_entropy(input, target) loss.backward()
input: tensor([[-0.6314, 0.6876, 0.8655, -1.8212, 0.0963], [-0.5437, 0.2778, -0.1662, -0.0784, -0.6565], [-0.1164, 0.3882, 0.2487, -0.5318, 0.3943]], requires_grad=True) target: tensor([1, 0, 0]) loss: tensor(1.6557, grad_fn=<NllLossBackward>)
The torch.nn.functional.cross_entropy function in python is implemented as follows:
def cross_entropy(input, target, weight=None, size_average=None, ignore_index=-100, reduce=None, reduction='mean'): if size_average is not None or reduce is not None: reduction = _Reduction.legacy_get_string(size_average, reduce) return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
Note 1: the input tensor does not need to go through softmax. The tensor directly taken from fn layer can be sent to the cross entropy, because softmax has been made for the input in the cross entropy.
Note 2: there is no need to encode the label one_hot, because the nll_loss function has implemented a similar one hot process. The difference is that when class = [1, 2, 3], it should start from 0 [0, 1, 2].
The address of the official website is also put here: torch.nn.functional — PyTorch master documentationhttps://pytorch.org/docs/1.2.0/nn.functional.html#torch.nn.functional.cross_entropy