[natural language processing] Introduction to PyTorch (essential basic knowledge)

Keywords: Machine Learning Pytorch Deep Learning NLP

PyTorch Foundation

In this book, we widely use PyTorch to implement our deep learning model. PyTorch is an open source, community driven deep learning framework. Unlike Theano, Caffe and TensorFlow, PyTorch implements a "tape based automatic differentiation" method that allows us to dynamically define and execute computational graphics. This is very helpful for debugging and building complex models with minimal effort.

Dynamic VS static computing images. Static frameworks such as Theano, Caffe and TensorFlow need to declare, compile and execute the calculation diagram first. Although this leads to a very efficient implementation (very useful in production and mobile settings), it can become very troublesome in the research and development process.

Modern frameworks such as Chainer, DyNet, and PyTorch implement dynamic computing diagrams to support a more flexible imperative development style without compiling the model before each execution.

Dynamic computing graphs are particularly useful when modeling NLP tasks, and each input may lead to a different graph structure.

PyTorch is an optimized tensor operation library, which provides a series of packages for deep learning.

The core of this library is tensor, which is a mathematical object containing some multidimensional data.

A tensor of order 0 is a number, or scalar.

The first-order tensor (first-order tensor) is an array of numbers, or a vector. Similarly, the second-order tensor is an array of vectors, or a matrix.

Therefore, tensors can be generalized to scalar n-dimensional arrays.

In the following sections, we will use PyTorch to learn the following:

  • Create tensor
  • Operation and tensor
  • Indexing, slicing, and linking with tensors
  • Calculating gradient with tensor
  • Using CUDA tensor with gpu

In the rest of this section, we will first use PyTorch to familiarize ourselves with the various PyTorch operations. We recommend that you now have PyTorch installed and your Python 3.5 + notebook ready, and follow the examples in this section. We also recommend that you complete the exercises later in this section.

Install PyTorch

The first step is to install PyTorch on your machine by selecting your system preferences on pytorch.org. Select your operating system, then select package manager (we recommend conda/pip), and then select the Python version you are using (we recommend 3.5 +). This generates commands for you to install PyTorch. At the time of writing, the installation commands of conda environment are as follows:

conda install pytorch torchvision -c pytorch

Note: if you have a graphics processor unit (GPU) that supports CUDA, you should also select the appropriate CUDA version. For more details, please refer to the installation instructions at pytorch.org.

Please refer to: PyTorch latest installation tutorial (July 27, 2021)

Create tensor

Firstly, we define an auxiliary function, description (x), which summarizes various properties of tensor x, such as tensor type, tensor dimension and tensor content:

Input[0]:
def describe(x):
  print("Type: {}".format(x.type()))
  print("Shape/size: {}".format(x.shape))
  print("Values: \n{}".format(x))

PyTorch allows us to create tensors in many different ways using the torch package. One way to create a tensor is to initialize it by specifying the dimension of a random tensor, as shown in example 1-3.

Example 1-3: creating tensors using torch.Tensor in PyTorch

Input[0]:
import torch
describe(torch.Tensor(2, 3))
Output[0]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 3.2018e-05,  4.5747e-41,  2.5058e+25],
        [ 3.0813e-41,  4.4842e-44,  0.0000e+00]])

We can also create a tensor through the uniform distribution (0,1) or standard normal distribution on the random initialization value interval (from the uniform distribution to the random initialization tensor, which is very important, as you will see in chapters 3 and 4), see examples 1-4.

Example 1-4: creating randomly initialized tensors

Input[0]: 
import torch
describe(torch.rand(2, 3))   # uniform random
describe(torch.randn(2, 3))  # random normal
Output[0]: 
Type:  torch.FloatTensor
Shape/size:  torch.Size([2, 3])
Values:
 tensor([[ 0.0242,  0.6630,  0.9787],
        [ 0.1037,  0.3920,  0.6084]])

Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[-0.1330, -2.9222, -1.3649],
        [ 2.3648,  1.1561,  1.5042]])

We can also create tensors, all of which are filled with the same scalar. For creating 0 or 1 tensors, we have built-in functions, and for filling in specific values, we can use fill_ () method.

Any PyTorch method with an underscore () refers to an in place operation; That is, it modifies the content in place without creating a new object, as shown in examples 1-5.

Example 1-5: creating filled tensors

Input[0]:
import torch
describe(torch.zeros(2, 3))
x = torch.ones(2, 3)
describe(x)
x.fill_(5)
describe(x)
Output[0]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 0.,  0.,  0.],
        [ 0.,  0.,  0.]])

Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 1.,  1.,  1.],
        [ 1.,  1.,  1.]])

Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 5.,  5.,  5.],
        [ 5.,  5.,  5.]])

Examples 1-6 demonstrate how to create tensors declaratively by using Python lists.

Example 1-6: creating and initializing tensors from lists

Input[0]:
x = torch.Tensor([[1, 2, 3],  
                  [4, 5, 6]])
describe(x)
Output[0]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 1.,  2., 3.],
        [ 4.,  5., 6.]])

Values can come from a list (as in the previous example) or from a NumPy array. Of course, we can also transform from PyTorch tensor to NumPy array.

Note that the type of this tensor is a double tensor, not the default FloatTensor. This corresponds to the data type float64 of NumPy random matrix, as shown in examples 1-7.

Example 1-7: creating and initializing tensors from NumPy

Input[0]:
import torch
import numpy as np
npy = np.random.rand(2, 3)
describe(torch.from_numpy(npy))
Output[0]:
Type: torch.DoubleTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 0.8360,  0.8836,  0.0545],
        [ 0.6928,  0.2333,  0.7984]], dtype=torch.float64)

When dealing with legacy libraries using numpy format values, the ability to switch between numpy and PyTorch tensors becomes very important.

Tensor type and size

Each tensor has a related type and size. The default tensor type when using torch. The tensor constructor is torch.FloatTensor. However, the tensor can be specified during initialization, or it can be converted to another type (float, long, double, etc.) later using the type conversion method. There are two methods to specify the initialization type. One is to directly call the constructor of specific tensor types (such as FloatTensor and LongTensor), and the other is to use the special method torch.tensor and provide dtype, as shown in example 1-8.

Example 1-8: tensor properties

Input[0]:
x = torch.FloatTensor([[1, 2, 3],  
                       [4, 5, 6]])
describe(x)
Output[0]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 1.,  2.,  3.],
        [ 4.,  5.,  6.]])
Input[1]:
x = x.long()
describe(x)
Output[1]:
Type: torch.LongTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 1,  2,  3],
        [ 4,  5,  6]])
Input[2]:
x = torch.tensor([[1, 2, 3],
                  [4, 5, 6]], dtype=torch.int64)
describe(x)
Output[2]:
Type: torch.LongTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 1,  2,  3],
        [ 4,  5,  6]])
Input[3]:
x = x.float()
describe(x)
Output[3]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 1.,  2.,  3.],
        [ 4.,  5.,  6.]])

We use the shape characteristics and size method of tensor object to obtain the measured value of its size. The two methods of accessing these metrics are basically the same. When debugging PyTorch code, checking the shape of tensor becomes an essential tool.

Tensor operation

After creating tensors, you can manipulate them as you would traditional programming language types such as +, -, * and /. In addition to operators, we can also use functions such as. add(), which correspond to symbolic operators, as shown in examples 1-9.

Example 1-9: tensor operation: addition

Input[0]:
import torch
x = torch.randn(2, 3)
describe(x)
Output[0]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 0.0461,  0.4024, -1.0115],
        [ 0.2167, -0.6123,  0.5036]])
Input[1]:
describe(torch.add(x, x))
Output[1]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 0.0923,  0.8048, -2.0231],
        [ 0.4335, -1.2245,  1.0072]])
Input[2]:
describe(x + x)
Output[2]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 0.0923,  0.8048, -2.0231],
        [ 0.4335, -1.2245,  1.0072]])

There are also some operations that can be applied to specific dimensions of tensors. As you may have noticed, for 2D tensors, we represent the row as dimension 0 and the list as dimension 1, as shown in examples 1-10.

Example 1-10: Dimension Based tensor operation

Input[0]:
import torch
x = torch.arange(6)
describe(x)
Output[0]:
Type: torch.FloatTensor
Shape/size: torch.Size([6])
Values:
tensor([ 0.,  1.,  2.,  3.,  4.,  5.])
Input[1]:
x = x.view(2, 3)
describe(x)
Output[1]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 0.,  1.,  2.],
        [ 3.,  4.,  5.]])
Input[2]:
describe(torch.sum(x, dim=0))
Output[2]:
Type: torch.FloatTensor
Shape/size: torch.Size([3])
Values:
tensor([ 3.,  5.,  7.])
Input[3]:
describe(torch.sum(x, dim=1))
Output[3]:
Type: torch.FloatTensor
Shape/size: torch.Size([2])
Values:
tensor([  3.,  12.])
Input[4]:
describe(torch.transpose(x, 0, 1))
Output[4]:
Type: torch.FloatTensor
Shape/size: torch.Size([3, 2])
Values:
tensor([[ 0.,  3.],
        [ 1.,  4.],
        [ 2.,  5.]])

In general, we need to perform more complex operations, including a combination of indexing,slicing,joining and mutation. Like NumPy and other digital libraries, PyTorch has built-in functions that make such tensor operations very simple.

Index, slice, and join

If you are a NumPy user, you may be very familiar with the indexing and slicing scheme of PyTorch shown in examples 1-11.

Example 1-11: slice and index tensors

Input[0]:
import torch
x = torch.arange(6).view(2, 3)
describe(x)
Output[0]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 0.,  1.,  2.],
        [ 3.,  4.,  5.]])
Input[1]:
describe(x[:1, :2])
Output[1]:
Type: torch.FloatTensor
Shape/size: torch.Size([1, 2])
Values:
tensor([[ 0.,  1.]])
Input[2]:
describe(x[0, 1])
Output[2]:
Type: torch.FloatTensor
Shape/size: torch.Size([])
Values:
1.0

Examples 1-12 demonstrate that PyTorch also has functions for complex indexing and slicing operations, and you may be interested in effectively accessing the discontinuous positions of tensors.

Example 1-12: complex index: discontinuous index of tensor

Input[0]:
indices = torch.LongTensor([0, 2])
describe(torch.index_select(x, dim=1, index=indices))
Output[0]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 2])
Values:
tensor([[ 0.,  2.],
        [ 3.,  5.]])
Input[1]:
indices = torch.LongTensor([0, 0])
describe(torch.index_select(x, dim=0, index=indices))
Output[1]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 0.,  1.,  2.],
        [ 0.,  1.,  2.]])
Input[2]:
row_indices = torch.arange(2).long()
col_indices = torch.LongTensor([0, 1])
describe(x[row_indices, col_indices])
Output[2]:
Type: torch.FloatTensor
Shape/size: torch.Size([2])
Values:
tensor([ 0.,  4.])

Note that indexes are a long tensor; This is a requirement for indexing using the PyTorch function. We can also use the built-in connection function to connect tensors, as shown in example 1-13, by specifying tensors and dimensions.

Example 1-13: connection tensor

Input[0]:
import torch
x = torch.arange(6).view(2,3)
describe(x)
Output[0]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 0.,  1.,  2.],
        [ 3.,  4.,  5.]])
Input[1]:
describe(torch.cat([x, x], dim=0))
Output[1]:
Type: torch.FloatTensor
Shape/size: torch.Size([4, 3])
Values:
tensor([[ 0.,  1.,  2.],
        [ 3.,  4.,  5.],
        [ 0.,  1.,  2.],
        [ 3.,  4.,  5.]])
Input[2]:
describe(torch.cat([x, x], dim=1))
Output[2]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 6])
Values:
tensor([[ 0.,  1.,  2.,  0.,  1.,  2.],
        [ 3.,  4.,  5.,  3.,  4.,  5.]])
Input[3]:
describe(torch.stack([x, x]))
Output[3]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 2, 3])
Values:
tensor([[[ 0.,  1.,  2.],
         [ 3.,  4.,  5.]],

        [[ 0.,  1.,  2.],
         [ 3.,  4.,  5.]]])

PyTorch also implements efficient linear algebraic operations on tensors, such as multiplication, inverse and trace, as shown in examples 1-14.

Example 1-14: linear algebra on tensors: multiplication

Input[0]:
import torch
x1 = torch.arange(6).view(2, 3)
describe(x1)
Output[0]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 3])
Values:
tensor([[ 0.,  1.,  2.],
        [ 3.,  4.,  5.]])
Input[1]:
x2 = torch.ones(3, 2)
x2[:, 1] += 1
describe(x2)
Output[1]:
Type: torch.FloatTensor
Shape/size: torch.Size([3, 2])
Values:
tensor([[ 1.,  2.],
        [ 1.,  2.],
        [ 1.,  2.]])
Input[2]:
describe(torch.mm(x1, x2))
Output[2]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 2])
Values:
tensor([[  3.,   6.],
        [ 12.,  24.]])

So far, we have studied the methods of creating and manipulating constant PyTorch tensor objects. Just like programming language (such as Python) variables encapsulate a piece of data, additional information about the data (such as memory address storage, for example), PyTorch tensor handles bookkeeping required to build a calculation graph. The calculation graph required to build a calculation graph is only instantiated by enabling a boolean flag.

Tensor and calculation diagram

PyTorch tensor class encapsulates data (tensor itself) and a series of operations, such as algebraic operation, index operation and shaping operation.

However, in the example shown in 1-15, when the requires_grad boolean flag is set to the tensor of True, the accounting operation is enabled, the gradient tensor that can be tracked and the gradient function need to discuss the "supervised learning paradigm" based on promoting gradient learning.

Example 1-15: creating tensors for gradient records

Input[0]:
import torch
x = torch.ones(2, 2, requires_grad=True)
describe(x)
print(x.grad is None)
Output[0]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 2])
Values:
tensor([[ 1.,  1.],
        [ 1.,  1.]])
True
Input[1]:
y = (x + 2) * (x + 5) + 3
describe(y)
print(x.grad is None)
Output[1]:
Type: torch.FloatTensor
Shape/size: torch.Size([2, 2])
Values:
tensor([[ 21.,  21.],
        [ 21.,  21.]])
True
Input[2]:
z = y.mean()
describe(z)
z.backward()
print(x.grad is None)
Output[2]:
Type: torch.FloatTensor
Shape/size: torch.Size([])
Values:
21.0
False

When you create tensors using requires_grad=True, you need PyTorch to manage bookkeeping information for calculating gradients.

First, PyTorch will track the value passed forward. Then, at the end of the calculation, a single scalar is used to calculate the value passed backward.

The backward pass is initialized by using the backward() method on a tensor, which is obtained by evaluating a loss function. The backward pass calculates the gradient value for the tensor object participating in the forward pass.

Generally speaking, the gradient is a value that represents the slope of the function output relative to the function input.

In the calculation graph setting, each parameter in the model has a gradient, which can be considered as the contribution of the parameter to the error signal. In PyTorch, you can use the. grad member variable to access the gradient of nodes in the calculation graph. The optimizer uses the. grad variable to update the value of the parameter.

So far, we have been allocating tensors in CPU memory. When doing linear algebraic operations, if you have a GPU, it may be meaningful to use it.

To make use of GPU, we first need to allocate the tensor on GPU memory. Access to GPU is through a special API called CUDA.

The CUDA API is created by NVIDIA and is only used on NVIDIA GPUs. The CUDA tensor object provided by PyTorch is no different from the conventional cpu binding tensor in use, except for the internal allocation method.

CUDA tensor

PyTorch makes it easy to create these CUDA tensors (examples 1-16). It transfers tensors from the CPU to the GPU while maintaining their underlying types. The preferred method in PyTorch is device independent and writing code that works on both the GPU and the CPU.

In the following code snippet, we first use torch.cuda.is_available() to check whether the GPU is available, and then use torch.device to retrieve the device name. Then, all future tensors will be instantiated and moved to the target device using the. to(device) method.

Example 1-16: creating CUDA tensors

Input[0]:
import torch
print (torch.cuda.is_available())
Output[0]:
True
Input[1]:
# preferred method: device agnostic tensor instantiation
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print (device)
Output[1]:
cuda
Input[2]:
x = torch.rand(3, 3).to(device)
describe(x)
Output[2]:
Type: torch.cuda.FloatTensor
Shape/size: torch.Size([3, 3])
Values:
tensor([[ 0.9149,  0.3993,  0.1100],
        [ 0.2541,  0.4333,  0.4451],
        [ 0.4966,  0.7865,  0.6604]], device='cuda:0')

To operate on CUDA and non CUDA objects, we need to ensure that they are on the same device. If we do not do so, the calculation will be interrupted, as shown in the following code fragment.

For example, this happens when calculating monitoring indicators that do not belong to the calculation diagram. When operating two tensor objects, ensure that they are on the same device. Examples 1-17 are shown.

Example 1-17: mixed CUDA tensor and CPU bound tensor

Input[0]
y = torch.rand(3, 3)
x + y
Output[0]
----------------------------------------------------------------------
RuntimeError                         Traceback (most recent call last)
      1 y = torch.rand(3, 3)
----> 2 x + y

RuntimeError: Expected object of type torch.cuda.FloatTensor but found type torch.FloatTensor for argument #3 'other'
Input[1]
cpu_device = torch.device("cpu")
y = y.to(cpu_device)
x = x.to(cpu_device)
x + y
Output[1]
tensor([[ 0.7159,  1.0685,  1.3509],
        [ 0.3912,  0.2838,  1.3202],
        [ 0.2967,  0.0420,  0.6559]])

Remember that moving data back and forth from the gpu is very expensive. Therefore, a typical process involves performing many parallel calculations on the gpu and then transmitting the final results back to the CPU. This will allow you to make full use of the gpu. If you have several CUDA visible devices (that is, the best practice is to use the CUDA_VISIBLE_DEVICES environment variable when executing the program, as shown in the following figure:

CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py

We do not cover parallelism and multi gpu training in this book, but they are essential in scaling experiments, sometimes even when training large models. We recommend that you refer to PyTorch documentation and discussion forums for more help and support on this topic.

practice

The best way to master a topic is to solve problems. Here are some warm-up exercises. Many problems will involve consulting official documents [1] and looking for useful functions

  1. Create a 2D tensor and then add a dimension of size 1 inserted at dimension 0.

  2. Remove the extra dimension you just added to the previous tensor.

  3. Create a random tensor of shape 5x3 in the interval [3, 7)

  4. Create a tensor with values from a normal distribution (mean=0, std=1).

  5. Retrieve the indexes of all the nonzero elements in the tensor torch.Tensor([1, 1, 1, 0, 1]).

  6. Create a random tensor of size (3,1) and then horizontally stack 4 copies together.

  7. Return the batch matrix-matrix product of two 3-dimensional matrices (a=torch.rand(3,4,5), b=torch.rand(3,5,4)).

  8. Return the batch matrix-matrix product of a 3D matrix and a 2D matrix (a=torch.rand(3,4,5), b=torch.rand(5,4)).

    Solutions

  9. a = torch.rand(3, 3) a.unsqueeze(0)

  10. a.squeeze(0)

  11. 3 + torch.rand(5, 3) * (7 - 3)

  12. a = torch.rand(3, 3) a.normal_()

  13. a = torch.Tensor([1, 1, 1, 0, 1]) torch.nonzero(a)

  14. a = torch.rand(3, 1) a.expand(3, 4)

  15. a = torch.rand(3, 4, 5) b = torch.rand(3, 5, 4) torch.bmm(a, b)

  16. a = torch.rand(3, 4, 5) b = torch.rand(5, 4) torch.bmm(a, b.unsqueeze(0).expand(a.size(0), * b.size()))

summary

In this chapter, we introduce the objectives of this book - natural language processing (NLP) and deep learning - and have a detailed understanding of the supervised learning paradigm.

At the end of this chapter, you should now be familiar with or at least understand various terms, such as observation, goal, model, parameter, prediction, loss function, representation, learning / training and reasoning. You also learned how to use single heat coding to encode the input (observation and goal) of learning tasks.

We also studied count based representations, such as TF and TF-IDF. First, we learned what are computational graphs, static and dynamic computational graphs, and PyTorch tensor manipulation operations. In Chapter 2, we summarized the traditional NLP. In Chapter 2, this chapter should lay the necessary foundation for you if you are new to the subject of this book and for the rest of your book Part of the preparation.

Focus on TF-IDF

Word frequency (TF) = the number of times a word appears in the article / the total number of words in the article

Inverse document frequency (IDF) = log (total number of documents in the corpus / (number of documents containing the word + 1))

TF should be easy to understand. IDF measures the frequency of words. In order to calculate IDF, we need to prepare a corpus in advance to simulate the language use environment. If a word is more common, the larger the denominator in the formula, the closer the inverse document frequency is to 0. Here, denominator + 1 is to avoid the situation where the denominator is 0

TF-IDF = word frequency (TF) × Inverse document frequency (IDF)

TF-IDF can achieve the purpose of extracting keywords from articles

For the purpose of learning, I quote the content of this book for non-commercial purposes. I recommend you to read this book and study together!!!

come on.

thank!

strive!

Posted by persia on Fri, 01 Oct 2021 16:39:16 -0700