Chapter 1 - Introduction to pytoch
NLP learning notes
Introduction to pytoch
Pytoch basic operation
Tensor tensor
from __future__ import print_function
import torch
# Create an uninitialized matrix
x = torch.empty(5,3)
print(x)
# Create a matrix with initialization
x = torch.rand(5,3)
print(x)
# The output is tensor([n,n,n])
It can be found that when using the empty met ...
Posted by Codewarrior123 on Tue, 19 Oct 2021 21:17:13 -0700
The first layer of the model! Detailed explanation of the differences between torch.nn.Embedding and torch.nn.Linear
1. General
torch.nn.Embedding is used to turn a number into a vector of a specified dimension. For example, number 1 becomes a 128 dimensional vector and number 2 becomes another 128 dimensional vector. However, these 128 dimensional vectors are not immutable. These 128 dimensional vectors are the real input of the model (that is, the firs ...
Posted by HalfaBee on Fri, 15 Oct 2021 01:11:14 -0700
pkuseg word segmentation library and its application
1. What is pkuseg
Pkuseg is a new Chinese word segmentation toolkit developed by the language computing and machine learning research group of Peking University. GitHub address: https://github.com/lancopku/pkuseg-python
2. Characteristics
Multi domain word segmentation. Different from the previous general Chinese word segmentation tools, ...
Posted by klapy on Wed, 13 Oct 2021 22:23:03 -0700
Detailed tutorials on environment configuration and preparation required to run Bert for the first time. Bert runs the official model and tests it with MRPC dataset
Step 1 download the required
Download bert source code and model
First, we download the source code and official model of bert and go to the official website:
https://github.com/google-research/bert
Download the official website source code: Download the official model: &n ...
Posted by TRUCKIN on Sat, 09 Oct 2021 20:42:01 -0700
Source code analysis of "Chinese information extraction" 1
2021SC@SDUSC
The book continues from the above. According to the division of labor, I am responsible for analyzing the source code in the parser package. Therefore, I will first give a basic overview of the parser package, and then analyze the source code.
1, Basic overview of parser package
As shown in the figure, there are many sub package ...
Posted by Helminthophobe on Thu, 07 Oct 2021 08:31:58 -0700
Chapter 3 news classification: multi classification problems
Reuters data set
For this news data set, this is a multi classification problem
Dataset characteristics:
Text classification datasetContains 46 different topicsThere are at least 10 samples for each topic in the training setThe dataset is in Keras and can be directly transferred in
Difference between multi classification problem and ...
Posted by novice_php on Mon, 04 Oct 2021 17:00:15 -0700
Pytoch + text CNN + word2vec movie review practice
0. Preface
reference resources: The blogger . I write my own blog to facilitate review
1. Film review data set
Dataset Download: Link: https://pan.baidu.com/s/1zultY2ODRFaW3XiQFS-36w Extraction code: mgh2 There are four files in the compressed package. Put the unzipped folder in the project directory The training data set is to ...
Posted by highphilosopher on Mon, 04 Oct 2021 13:06:59 -0700
Cbow & skip gram of Word2Vec
We introduced the distribution hypothesis before, mainly through the context to construct a co-occurrence matrix, cosine similarity, Jaccard similarity and point mutual information can be used to measure the similarity or relevance of words based on the co-occurrence matrix. In order to avoid the statistical unrelia ...
Posted by The Jackel on Mon, 04 Oct 2021 10:10:36 -0700
[natural language processing] Introduction to PyTorch (essential basic knowledge)
PyTorch Foundation
In this book, we widely use PyTorch to implement our deep learning model. PyTorch is an open source, community driven deep learning framework. Unlike Theano, Caffe and TensorFlow, PyTorch implements a "tape based automatic differentiation" method that allows us to dynamically define and execute computational gr ...
Posted by persia on Fri, 01 Oct 2021 16:39:16 -0700
Natural language processing - rule segmentation
What is rule participle
Rule based word segmentation is a mechanical word segmentation method, which is mainly through maintaining the dictionary. When segmenting a sentence, match each string of the sentence with the words in the thesaurus one by one, and if found, it will be segmented, otherwise it will not be segmented.
According to the wa ...
Posted by xjake88x on Thu, 30 Sep 2021 19:01:07 -0700