PyTorch Foundation
In this book, we widely use PyTorch to implement our deep learning model. PyTorch is an open source, community driven deep learning framework. Unlike Theano, Caffe and TensorFlow, PyTorch implements a "tape based automatic differentiation" method that allows us to dynamically define and execute computational graphics. This is very helpful for debugging and building complex models with minimal effort.
Dynamic VS static computing images. Static frameworks such as Theano, Caffe and TensorFlow need to declare, compile and execute the calculation diagram first. Although this leads to a very efficient implementation (very useful in production and mobile settings), it can become very troublesome in the research and development process.
Modern frameworks such as Chainer, DyNet, and PyTorch implement dynamic computing diagrams to support a more flexible imperative development style without compiling the model before each execution.
Dynamic computing graphs are particularly useful when modeling NLP tasks, and each input may lead to a different graph structure.
PyTorch is an optimized tensor operation library, which provides a series of packages for deep learning.
The core of this library is tensor, which is a mathematical object containing some multidimensional data.
A tensor of order 0 is a number, or scalar.
The firstorder tensor (firstorder tensor) is an array of numbers, or a vector. Similarly, the secondorder tensor is an array of vectors, or a matrix.
Therefore, tensors can be generalized to scalar ndimensional arrays.
In the following sections, we will use PyTorch to learn the following:
 Create tensor
 Operation and tensor
 Indexing, slicing, and linking with tensors
 Calculating gradient with tensor
 Using CUDA tensor with gpu
In the rest of this section, we will first use PyTorch to familiarize ourselves with the various PyTorch operations. We recommend that you now have PyTorch installed and your Python 3.5 + notebook ready, and follow the examples in this section. We also recommend that you complete the exercises later in this section.
Install PyTorch
The first step is to install PyTorch on your machine by selecting your system preferences on pytorch.org. Select your operating system, then select package manager (we recommend conda/pip), and then select the Python version you are using (we recommend 3.5 +). This generates commands for you to install PyTorch. At the time of writing, the installation commands of conda environment are as follows:
conda install pytorch torchvision c pytorch
Note: if you have a graphics processor unit (GPU) that supports CUDA, you should also select the appropriate CUDA version. For more details, please refer to the installation instructions at pytorch.org.
Please refer to: PyTorch latest installation tutorial (July 27, 2021)
Create tensor
Firstly, we define an auxiliary function, description (x), which summarizes various properties of tensor x, such as tensor type, tensor dimension and tensor content:
Input[0]: def describe(x): print("Type: {}".format(x.type())) print("Shape/size: {}".format(x.shape)) print("Values: \n{}".format(x))
PyTorch allows us to create tensors in many different ways using the torch package. One way to create a tensor is to initialize it by specifying the dimension of a random tensor, as shown in example 13.
Example 13: creating tensors using torch.Tensor in PyTorch
Input[0]: import torch describe(torch.Tensor(2, 3)) Output[0]: Type: torch.FloatTensor Shape/size: torch.Size([2, 3]) Values: tensor([[ 3.2018e05, 4.5747e41, 2.5058e+25], [ 3.0813e41, 4.4842e44, 0.0000e+00]])
We can also create a tensor through the uniform distribution (0,1) or standard normal distribution on the random initialization value interval (from the uniform distribution to the random initialization tensor, which is very important, as you will see in chapters 3 and 4), see examples 14.
Example 14: creating randomly initialized tensors
Input[0]: import torch describe(torch.rand(2, 3)) # uniform random describe(torch.randn(2, 3)) # random normal Output[0]: Type: torch.FloatTensor Shape/size: torch.Size([2, 3]) Values: tensor([[ 0.0242, 0.6630, 0.9787], [ 0.1037, 0.3920, 0.6084]]) Type: torch.FloatTensor Shape/size: torch.Size([2, 3]) Values: tensor([[0.1330, 2.9222, 1.3649], [ 2.3648, 1.1561, 1.5042]])
We can also create tensors, all of which are filled with the same scalar. For creating 0 or 1 tensors, we have builtin functions, and for filling in specific values, we can use fill_ () method.
Any PyTorch method with an underscore () refers to an in place operation; That is, it modifies the content in place without creating a new object, as shown in examples 15.
Example 15: creating filled tensors
Input[0]: import torch describe(torch.zeros(2, 3)) x = torch.ones(2, 3) describe(x) x.fill_(5) describe(x) Output[0]: Type: torch.FloatTensor Shape/size: torch.Size([2, 3]) Values: tensor([[ 0., 0., 0.], [ 0., 0., 0.]]) Type: torch.FloatTensor Shape/size: torch.Size([2, 3]) Values: tensor([[ 1., 1., 1.], [ 1., 1., 1.]]) Type: torch.FloatTensor Shape/size: torch.Size([2, 3]) Values: tensor([[ 5., 5., 5.], [ 5., 5., 5.]])
Examples 16 demonstrate how to create tensors declaratively by using Python lists.
Example 16: creating and initializing tensors from lists
Input[0]: x = torch.Tensor([[1, 2, 3], [4, 5, 6]]) describe(x) Output[0]: Type: torch.FloatTensor Shape/size: torch.Size([2, 3]) Values: tensor([[ 1., 2., 3.], [ 4., 5., 6.]])
Values can come from a list (as in the previous example) or from a NumPy array. Of course, we can also transform from PyTorch tensor to NumPy array.
Note that the type of this tensor is a double tensor, not the default FloatTensor. This corresponds to the data type float64 of NumPy random matrix, as shown in examples 17.
Example 17: creating and initializing tensors from NumPy
Input[0]: import torch import numpy as np npy = np.random.rand(2, 3) describe(torch.from_numpy(npy)) Output[0]: Type: torch.DoubleTensor Shape/size: torch.Size([2, 3]) Values: tensor([[ 0.8360, 0.8836, 0.0545], [ 0.6928, 0.2333, 0.7984]], dtype=torch.float64)
When dealing with legacy libraries using numpy format values, the ability to switch between numpy and PyTorch tensors becomes very important.
Tensor type and size
Each tensor has a related type and size. The default tensor type when using torch. The tensor constructor is torch.FloatTensor. However, the tensor can be specified during initialization, or it can be converted to another type (float, long, double, etc.) later using the type conversion method. There are two methods to specify the initialization type. One is to directly call the constructor of specific tensor types (such as FloatTensor and LongTensor), and the other is to use the special method torch.tensor and provide dtype, as shown in example 18.
Example 18: tensor properties
Input[0]: x = torch.FloatTensor([[1, 2, 3], [4, 5, 6]]) describe(x) Output[0]: Type: torch.FloatTensor Shape/size: torch.Size([2, 3]) Values: tensor([[ 1., 2., 3.], [ 4., 5., 6.]]) Input[1]: x = x.long() describe(x) Output[1]: Type: torch.LongTensor Shape/size: torch.Size([2, 3]) Values: tensor([[ 1, 2, 3], [ 4, 5, 6]]) Input[2]: x = torch.tensor([[1, 2, 3], [4, 5, 6]], dtype=torch.int64) describe(x) Output[2]: Type: torch.LongTensor Shape/size: torch.Size([2, 3]) Values: tensor([[ 1, 2, 3], [ 4, 5, 6]]) Input[3]: x = x.float() describe(x) Output[3]: Type: torch.FloatTensor Shape/size: torch.Size([2, 3]) Values: tensor([[ 1., 2., 3.], [ 4., 5., 6.]])
We use the shape characteristics and size method of tensor object to obtain the measured value of its size. The two methods of accessing these metrics are basically the same. When debugging PyTorch code, checking the shape of tensor becomes an essential tool.
Tensor operation
After creating tensors, you can manipulate them as you would traditional programming language types such as +, , * and /. In addition to operators, we can also use functions such as. add(), which correspond to symbolic operators, as shown in examples 19.
Example 19: tensor operation: addition
Input[0]: import torch x = torch.randn(2, 3) describe(x) Output[0]: Type: torch.FloatTensor Shape/size: torch.Size([2, 3]) Values: tensor([[ 0.0461, 0.4024, 1.0115], [ 0.2167, 0.6123, 0.5036]]) Input[1]: describe(torch.add(x, x)) Output[1]: Type: torch.FloatTensor Shape/size: torch.Size([2, 3]) Values: tensor([[ 0.0923, 0.8048, 2.0231], [ 0.4335, 1.2245, 1.0072]]) Input[2]: describe(x + x) Output[2]: Type: torch.FloatTensor Shape/size: torch.Size([2, 3]) Values: tensor([[ 0.0923, 0.8048, 2.0231], [ 0.4335, 1.2245, 1.0072]])
There are also some operations that can be applied to specific dimensions of tensors. As you may have noticed, for 2D tensors, we represent the row as dimension 0 and the list as dimension 1, as shown in examples 110.
Example 110: Dimension Based tensor operation
Input[0]: import torch x = torch.arange(6) describe(x) Output[0]: Type: torch.FloatTensor Shape/size: torch.Size([6]) Values: tensor([ 0., 1., 2., 3., 4., 5.]) Input[1]: x = x.view(2, 3) describe(x) Output[1]: Type: torch.FloatTensor Shape/size: torch.Size([2, 3]) Values: tensor([[ 0., 1., 2.], [ 3., 4., 5.]]) Input[2]: describe(torch.sum(x, dim=0)) Output[2]: Type: torch.FloatTensor Shape/size: torch.Size([3]) Values: tensor([ 3., 5., 7.]) Input[3]: describe(torch.sum(x, dim=1)) Output[3]: Type: torch.FloatTensor Shape/size: torch.Size([2]) Values: tensor([ 3., 12.]) Input[4]: describe(torch.transpose(x, 0, 1)) Output[4]: Type: torch.FloatTensor Shape/size: torch.Size([3, 2]) Values: tensor([[ 0., 3.], [ 1., 4.], [ 2., 5.]])
In general, we need to perform more complex operations, including a combination of indexing,slicing,joining and mutation. Like NumPy and other digital libraries, PyTorch has builtin functions that make such tensor operations very simple.
Index, slice, and join
If you are a NumPy user, you may be very familiar with the indexing and slicing scheme of PyTorch shown in examples 111.
Example 111: slice and index tensors
Input[0]: import torch x = torch.arange(6).view(2, 3) describe(x) Output[0]: Type: torch.FloatTensor Shape/size: torch.Size([2, 3]) Values: tensor([[ 0., 1., 2.], [ 3., 4., 5.]]) Input[1]: describe(x[:1, :2]) Output[1]: Type: torch.FloatTensor Shape/size: torch.Size([1, 2]) Values: tensor([[ 0., 1.]]) Input[2]: describe(x[0, 1]) Output[2]: Type: torch.FloatTensor Shape/size: torch.Size([]) Values: 1.0
Examples 112 demonstrate that PyTorch also has functions for complex indexing and slicing operations, and you may be interested in effectively accessing the discontinuous positions of tensors.
Example 112: complex index: discontinuous index of tensor
Input[0]: indices = torch.LongTensor([0, 2]) describe(torch.index_select(x, dim=1, index=indices)) Output[0]: Type: torch.FloatTensor Shape/size: torch.Size([2, 2]) Values: tensor([[ 0., 2.], [ 3., 5.]]) Input[1]: indices = torch.LongTensor([0, 0]) describe(torch.index_select(x, dim=0, index=indices)) Output[1]: Type: torch.FloatTensor Shape/size: torch.Size([2, 3]) Values: tensor([[ 0., 1., 2.], [ 0., 1., 2.]]) Input[2]: row_indices = torch.arange(2).long() col_indices = torch.LongTensor([0, 1]) describe(x[row_indices, col_indices]) Output[2]: Type: torch.FloatTensor Shape/size: torch.Size([2]) Values: tensor([ 0., 4.])
Note that indexes are a long tensor; This is a requirement for indexing using the PyTorch function. We can also use the builtin connection function to connect tensors, as shown in example 113, by specifying tensors and dimensions.
Example 113: connection tensor
Input[0]: import torch x = torch.arange(6).view(2,3) describe(x) Output[0]: Type: torch.FloatTensor Shape/size: torch.Size([2, 3]) Values: tensor([[ 0., 1., 2.], [ 3., 4., 5.]]) Input[1]: describe(torch.cat([x, x], dim=0)) Output[1]: Type: torch.FloatTensor Shape/size: torch.Size([4, 3]) Values: tensor([[ 0., 1., 2.], [ 3., 4., 5.], [ 0., 1., 2.], [ 3., 4., 5.]]) Input[2]: describe(torch.cat([x, x], dim=1)) Output[2]: Type: torch.FloatTensor Shape/size: torch.Size([2, 6]) Values: tensor([[ 0., 1., 2., 0., 1., 2.], [ 3., 4., 5., 3., 4., 5.]]) Input[3]: describe(torch.stack([x, x])) Output[3]: Type: torch.FloatTensor Shape/size: torch.Size([2, 2, 3]) Values: tensor([[[ 0., 1., 2.], [ 3., 4., 5.]], [[ 0., 1., 2.], [ 3., 4., 5.]]])
PyTorch also implements efficient linear algebraic operations on tensors, such as multiplication, inverse and trace, as shown in examples 114.
Example 114: linear algebra on tensors: multiplication
Input[0]: import torch x1 = torch.arange(6).view(2, 3) describe(x1) Output[0]: Type: torch.FloatTensor Shape/size: torch.Size([2, 3]) Values: tensor([[ 0., 1., 2.], [ 3., 4., 5.]]) Input[1]: x2 = torch.ones(3, 2) x2[:, 1] += 1 describe(x2) Output[1]: Type: torch.FloatTensor Shape/size: torch.Size([3, 2]) Values: tensor([[ 1., 2.], [ 1., 2.], [ 1., 2.]]) Input[2]: describe(torch.mm(x1, x2)) Output[2]: Type: torch.FloatTensor Shape/size: torch.Size([2, 2]) Values: tensor([[ 3., 6.], [ 12., 24.]])
So far, we have studied the methods of creating and manipulating constant PyTorch tensor objects. Just like programming language (such as Python) variables encapsulate a piece of data, additional information about the data (such as memory address storage, for example), PyTorch tensor handles bookkeeping required to build a calculation graph. The calculation graph required to build a calculation graph is only instantiated by enabling a boolean flag.
Tensor and calculation diagram
PyTorch tensor class encapsulates data (tensor itself) and a series of operations, such as algebraic operation, index operation and shaping operation.
However, in the example shown in 115, when the requires_grad boolean flag is set to the tensor of True, the accounting operation is enabled, the gradient tensor that can be tracked and the gradient function need to discuss the "supervised learning paradigm" based on promoting gradient learning.
Example 115: creating tensors for gradient records
Input[0]: import torch x = torch.ones(2, 2, requires_grad=True) describe(x) print(x.grad is None) Output[0]: Type: torch.FloatTensor Shape/size: torch.Size([2, 2]) Values: tensor([[ 1., 1.], [ 1., 1.]]) True Input[1]: y = (x + 2) * (x + 5) + 3 describe(y) print(x.grad is None) Output[1]: Type: torch.FloatTensor Shape/size: torch.Size([2, 2]) Values: tensor([[ 21., 21.], [ 21., 21.]]) True Input[2]: z = y.mean() describe(z) z.backward() print(x.grad is None) Output[2]: Type: torch.FloatTensor Shape/size: torch.Size([]) Values: 21.0 False
When you create tensors using requires_grad=True, you need PyTorch to manage bookkeeping information for calculating gradients.
First, PyTorch will track the value passed forward. Then, at the end of the calculation, a single scalar is used to calculate the value passed backward.
The backward pass is initialized by using the backward() method on a tensor, which is obtained by evaluating a loss function. The backward pass calculates the gradient value for the tensor object participating in the forward pass.
Generally speaking, the gradient is a value that represents the slope of the function output relative to the function input.
In the calculation graph setting, each parameter in the model has a gradient, which can be considered as the contribution of the parameter to the error signal. In PyTorch, you can use the. grad member variable to access the gradient of nodes in the calculation graph. The optimizer uses the. grad variable to update the value of the parameter.
So far, we have been allocating tensors in CPU memory. When doing linear algebraic operations, if you have a GPU, it may be meaningful to use it.
To make use of GPU, we first need to allocate the tensor on GPU memory. Access to GPU is through a special API called CUDA.
The CUDA API is created by NVIDIA and is only used on NVIDIA GPUs. The CUDA tensor object provided by PyTorch is no different from the conventional cpu binding tensor in use, except for the internal allocation method.
CUDA tensor
PyTorch makes it easy to create these CUDA tensors (examples 116). It transfers tensors from the CPU to the GPU while maintaining their underlying types. The preferred method in PyTorch is device independent and writing code that works on both the GPU and the CPU.
In the following code snippet, we first use torch.cuda.is_available() to check whether the GPU is available, and then use torch.device to retrieve the device name. Then, all future tensors will be instantiated and moved to the target device using the. to(device) method.
Example 116: creating CUDA tensors
Input[0]: import torch print (torch.cuda.is_available()) Output[0]: True Input[1]: # preferred method: device agnostic tensor instantiation device = torch.device("cuda" if torch.cuda.is_available() else "cpu") print (device) Output[1]: cuda Input[2]: x = torch.rand(3, 3).to(device) describe(x) Output[2]: Type: torch.cuda.FloatTensor Shape/size: torch.Size([3, 3]) Values: tensor([[ 0.9149, 0.3993, 0.1100], [ 0.2541, 0.4333, 0.4451], [ 0.4966, 0.7865, 0.6604]], device='cuda:0')
To operate on CUDA and non CUDA objects, we need to ensure that they are on the same device. If we do not do so, the calculation will be interrupted, as shown in the following code fragment.
For example, this happens when calculating monitoring indicators that do not belong to the calculation diagram. When operating two tensor objects, ensure that they are on the same device. Examples 117 are shown.
Example 117: mixed CUDA tensor and CPU bound tensor
Input[0] y = torch.rand(3, 3) x + y Output[0]  RuntimeError Traceback (most recent call last) 1 y = torch.rand(3, 3) > 2 x + y RuntimeError: Expected object of type torch.cuda.FloatTensor but found type torch.FloatTensor for argument #3 'other' Input[1] cpu_device = torch.device("cpu") y = y.to(cpu_device) x = x.to(cpu_device) x + y Output[1] tensor([[ 0.7159, 1.0685, 1.3509], [ 0.3912, 0.2838, 1.3202], [ 0.2967, 0.0420, 0.6559]])
Remember that moving data back and forth from the gpu is very expensive. Therefore, a typical process involves performing many parallel calculations on the gpu and then transmitting the final results back to the CPU. This will allow you to make full use of the gpu. If you have several CUDA visible devices (that is, the best practice is to use the CUDA_VISIBLE_DEVICES environment variable when executing the program, as shown in the following figure:
CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py
We do not cover parallelism and multi gpu training in this book, but they are essential in scaling experiments, sometimes even when training large models. We recommend that you refer to PyTorch documentation and discussion forums for more help and support on this topic.
practice
The best way to master a topic is to solve problems. Here are some warmup exercises. Many problems will involve consulting official documents [1] and looking for useful functions

Create a 2D tensor and then add a dimension of size 1 inserted at dimension 0.

Remove the extra dimension you just added to the previous tensor.

Create a random tensor of shape 5x3 in the interval [3, 7)

Create a tensor with values from a normal distribution (mean=0, std=1).

Retrieve the indexes of all the nonzero elements in the tensor torch.Tensor([1, 1, 1, 0, 1]).

Create a random tensor of size (3,1) and then horizontally stack 4 copies together.

Return the batch matrixmatrix product of two 3dimensional matrices (a=torch.rand(3,4,5), b=torch.rand(3,5,4)).

Return the batch matrixmatrix product of a 3D matrix and a 2D matrix (a=torch.rand(3,4,5), b=torch.rand(5,4)).
Solutions

a = torch.rand(3, 3) a.unsqueeze(0)

a.squeeze(0)

3 + torch.rand(5, 3) * (7  3)

a = torch.rand(3, 3) a.normal_()

a = torch.Tensor([1, 1, 1, 0, 1]) torch.nonzero(a)

a = torch.rand(3, 1) a.expand(3, 4)

a = torch.rand(3, 4, 5) b = torch.rand(3, 5, 4) torch.bmm(a, b)

a = torch.rand(3, 4, 5) b = torch.rand(5, 4) torch.bmm(a, b.unsqueeze(0).expand(a.size(0), * b.size()))
summary
In this chapter, we introduce the objectives of this book  natural language processing (NLP) and deep learning  and have a detailed understanding of the supervised learning paradigm.
At the end of this chapter, you should now be familiar with or at least understand various terms, such as observation, goal, model, parameter, prediction, loss function, representation, learning / training and reasoning. You also learned how to use single heat coding to encode the input (observation and goal) of learning tasks.
We also studied count based representations, such as TF and TFIDF. First, we learned what are computational graphs, static and dynamic computational graphs, and PyTorch tensor manipulation operations. In Chapter 2, we summarized the traditional NLP. In Chapter 2, this chapter should lay the necessary foundation for you if you are new to the subject of this book and for the rest of your book Part of the preparation.
Focus on TFIDF
Word frequency (TF) = the number of times a word appears in the article / the total number of words in the article
Inverse document frequency (IDF) = log (total number of documents in the corpus / (number of documents containing the word + 1))
TF should be easy to understand. IDF measures the frequency of words. In order to calculate IDF, we need to prepare a corpus in advance to simulate the language use environment. If a word is more common, the larger the denominator in the formula, the closer the inverse document frequency is to 0. Here, denominator + 1 is to avoid the situation where the denominator is 0
TFIDF = word frequency (TF) × Inverse document frequency (IDF)
TFIDF can achieve the purpose of extracting keywords from articles
For the purpose of learning, I quote the content of this book for noncommercial purposes. I recommend you to read this book and study together!!!
come on.
thank!
strive!