The difference and connection between torch.nn and torch.nn.function from relu's various implementations

Keywords: Python network

The difference and connection between torch.nn and torch.nn.function from relu's various implementations

The relationship between relu's multiple implementations

The relu function appears three times in pytorch:

  1. torch.nn.ReLU()
  2. torch.nn.functional.relu_() torch.nn.functional.relu_()
  3. torch.relu() torch.relu_()

These three different implementations actually have a fixed packaging relationship, from top to bottom is a process from the table to the inside.

The last one is actually not included in pytorch's official documents, and the corresponding python code is not found. It exists only in _init_ pyi, because they come from the THNN library written in C++.

Following is the analysis of the source code for specific analysis:

  1. torch.nn.ReLU()
    The class in torch.nn represents the neural network layer. Here we see that the ReLU() that appears as a class actually calls the relu relu_implementation in torch.nn. function.
class ReLU(Module):
    r"""Applies the rectified linear unit function element-wise:

    :math:`\text{ReLU}(x)= \max(0, x)`

    Args:
        inplace: can optionally do the operation in-place. Default: ``False``

    Shape:
        - Input: :math:`(N, *)` where `*` means, any number of additional
          dimensions
        - Output: :math:`(N, *)`, same shape as the input

    .. image:: scripts/activation_images/ReLU.png

    Examples::

        >>> m = nn.ReLU()
        >>> input = torch.randn(2)
        >>> output = m(input)


      An implementation of CReLU - https://arxiv.org/abs/1603.05201

        >>> m = nn.ReLU()
        >>> input = torch.randn(2).unsqueeze(0)
        >>> output = torch.cat((m(input),m(-input)))
    """
    __constants__ = ['inplace']

    def __init__(self, inplace=False):
        super(ReLU, self).__init__()
        self.inplace = inplace

    @weak_script_method
    def forward(self, input):
      # F comes from import nn.functional as F
        return F.relu(input, inplace=self.inplace)

    def extra_repr(self):
        inplace_str = 'inplace' if self.inplace else ''
        return inplace_str
  1. torch.nn.functional.relu() torch.nn.functional.relu_()
    In fact, these two functions also call torch.relu() and torch.relu ()
def relu(input, inplace=False):
    # type: (Tensor, bool) -> Tensor
    r"""relu(input, inplace=False) -> Tensor

    Applies the rectified linear unit function element-wise. See
    :class:`~torch.nn.ReLU` for more details.
    """
    if inplace:
        result = torch.relu_(input)
    else:
        result = torch.relu(input)
    return result


relu_ = _add_docstr(torch.relu_, r"""
relu_(input) -> Tensor

In-place version of :func:`~relu`.
""")

So far, we have a deep understanding of the emergence of RELU function in torch. In fact, the relationship between torch.nn and torch.nn. function is the relationship between reference and packaging.

The Differences and Relations between Toch.nn and Toch.nn.Functional

Combined with the above analysis of relu, we can more clearly understand the relationship between the two libraries.

Generally speaking, torch. nn. function calls the THNN library to implement core computing, but it does not manage learnable_parameters such as weight bias, which brings inconvenience to the use of the model. The model implemented in torch.nn is for torch.nn.function, which is essentially an official example of torch.nn.function. We can use pytorch quickly and conveniently by calling these examples directly, but the example may not be able to cater for everyone's use needs, so keep torch.nn.functional. To provide flexibility for these users, they can assemble the required models themselves. So pytorch can balance flexibility and ease of use.

In particular, torch.nn is not all examples of torch.nn. Some functions from other libraries are called. For example, the commonly used family of RNN s does not appear in torch.nn. function.

Let's conclude with this consideration by looking at the next example:

For Linear, please note that Compare the differences between the implementations under the two libraries:

  1. Management of learnable parameters
  2. Call relationship between each other
  3. Initialization process
class Linear(Module):
    r"""Applies a linear transformation to the incoming data: :math:`y = xA^T + b`

    Args:
        in_features: size of each input sample
        out_features: size of each output sample
        bias: If set to ``False``, the layer will not learn an additive bias.
            Default: ``True``

    Shape:
        - Input: :math:`(N, *, H_{in})` where :math:`*` means any number of
          additional dimensions and :math:`H_{in} = \text{in\_features}`
        - Output: :math:`(N, *, H_{out})` where all but the last dimension
          are the same shape as the input and :math:`H_{out} = \text{out\_features}`.

    Attributes:
        weight: the learnable weights of the module of shape
            :math:`(\text{out\_features}, \text{in\_features})`. The values are
            initialized from :math:`\mathcal{U}(-\sqrt{k}, \sqrt{k})`, where
            :math:`k = \frac{1}{\text{in\_features}}`
        bias:   the learnable bias of the module of shape :math:`(\text{out\_features})`.
                If :attr:`bias` is ``True``, the values are initialized from
                :math:`\mathcal{U}(-\sqrt{k}, \sqrt{k})` where
                :math:`k = \frac{1}{\text{in\_features}}`

    Examples::

        >>> m = nn.Linear(20, 30)
        >>> input = torch.randn(128, 20)
        >>> output = m(input)
        >>> print(output.size())
        torch.Size([128, 30])
    """
    __constants__ = ['bias']

    def __init__(self, in_features, out_features, bias=True):
        super(Linear, self).__init__()
        self.in_features = in_features
        self.out_features = out_features
        self.weight = Parameter(torch.Tensor(out_features, in_features))
        if bias:
            self.bias = Parameter(torch.Tensor(out_features))
        else:
            self.register_parameter('bias', None)
        self.reset_parameters()

    def reset_parameters(self):
        init.kaiming_uniform_(self.weight, a=math.sqrt(5))
        if self.bias is not None:
            fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
            bound = 1 / math.sqrt(fan_in)
            init.uniform_(self.bias, -bound, bound)

    @weak_script_method
    def forward(self, input):
        return F.linear(input, self.weight, self.bias)

    def extra_repr(self):
        return 'in_features={}, out_features={}, bias={}'.format(
            self.in_features, self.out_features, self.bias is not None
        )
def linear(input, weight, bias=None):
    # type: (Tensor, Tensor, Optional[Tensor]) -> Tensor
    r"""
    Applies a linear transformation to the incoming data: :math:`y = xA^T + b`.

    Shape:

        - Input: :math:`(N, *, in\_features)` where `*` means any number of
          additional dimensions
        - Weight: :math:`(out\_features, in\_features)`
        - Bias: :math:`(out\_features)`
        - Output: :math:`(N, *, out\_features)`
    """
    if input.dim() == 2 and bias is not None:
        # fused op is marginally faster
        ret = torch.addmm(bias, input, weight.t())
    else:
        output = input.matmul(weight.t())
        if bias is not None:
            output += bias
        ret = output
    return ret

Posted by phpform08 on Sun, 18 Aug 2019 23:22:21 -0700