NLP [Page 2] - Programmer Group - a programming skills sharing group

NLP

Chapter 1 - Introduction to pytoch

NLP learning notes Introduction to pytoch Pytoch basic operation Tensor tensor from __future__ import print_function import torch # Create an uninitialized matrix x = torch.empty(5,3) print(x) # Create a matrix with initialization x = torch.rand(5,3) print(x) # The output is tensor([n,n,n]) It can be found that when using the empty met ...

Posted by Codewarrior123 on Tue, 19 Oct 2021 21:17:13 -0700

The first layer of the model! Detailed explanation of the differences between torch.nn.Embedding and torch.nn.Linear

1. General torch.nn.Embedding is used to turn a number into a vector of a specified dimension. For example, number 1 becomes a 128 dimensional vector and number 2 becomes another 128 dimensional vector. However, these 128 dimensional vectors are not immutable. These 128 dimensional vectors are the real input of the model (that is, the firs ...

Posted by HalfaBee on Fri, 15 Oct 2021 01:11:14 -0700

pkuseg word segmentation library and its application

1. What is pkuseg Pkuseg is a new Chinese word segmentation toolkit developed by the language computing and machine learning research group of Peking University. GitHub address: https://github.com/lancopku/pkuseg-python 2. Characteristics Multi domain word segmentation. Different from the previous general Chinese word segmentation tools, ...

Posted by klapy on Wed, 13 Oct 2021 22:23:03 -0700

Detailed tutorials on environment configuration and preparation required to run Bert for the first time. Bert runs the official model and tests it with MRPC dataset

Step 1 download the required Download bert source code and model First, we download the source code and official model of bert and go to the official website: https://github.com/google-research/bert Download the official website source code: Download the official model: &n ...

Posted by TRUCKIN on Sat, 09 Oct 2021 20:42:01 -0700

Source code analysis of "Chinese information extraction" 1

2021SC@SDUSC The book continues from the above. According to the division of labor, I am responsible for analyzing the source code in the parser package. Therefore, I will first give a basic overview of the parser package, and then analyze the source code. 1, Basic overview of parser package As shown in the figure, there are many sub package ...

Posted by Helminthophobe on Thu, 07 Oct 2021 08:31:58 -0700

Chapter 3 news classification: multi classification problems

Reuters data set For this news data set, this is a multi classification problem Dataset characteristics: Text classification datasetContains 46 different topicsThere are at least 10 samples for each topic in the training setThe dataset is in Keras and can be directly transferred in Difference between multi classification problem and ...

Posted by novice_php on Mon, 04 Oct 2021 17:00:15 -0700

Pytoch + text CNN + word2vec movie review practice

0. Preface reference resources: The blogger . I write my own blog to facilitate review 1. Film review data set Dataset Download: Link: https://pan.baidu.com/s/1zultY2ODRFaW3XiQFS-36w Extraction code: mgh2 There are four files in the compressed package. Put the unzipped folder in the project directory The training data set is to ...

Posted by highphilosopher on Mon, 04 Oct 2021 13:06:59 -0700

Cbow & skip gram of Word2Vec

We introduced the distribution hypothesis before, mainly through the context to construct a co-occurrence matrix, cosine similarity, Jaccard similarity and point mutual information can be used to measure the similarity or relevance of words based on the co-occurrence matrix. In order to avoid the statistical unrelia ...

Posted by The Jackel on Mon, 04 Oct 2021 10:10:36 -0700

[natural language processing] Introduction to PyTorch (essential basic knowledge)

PyTorch Foundation In this book, we widely use PyTorch to implement our deep learning model. PyTorch is an open source, community driven deep learning framework. Unlike Theano, Caffe and TensorFlow, PyTorch implements a "tape based automatic differentiation" method that allows us to dynamically define and execute computational gr ...

Posted by persia on Fri, 01 Oct 2021 16:39:16 -0700

Natural language processing - rule segmentation

What is rule participle Rule based word segmentation is a mechanical word segmentation method, which is mainly through maintaining the dictionary. When segmenting a sentence, match each string of the sentence with the words in the thesaurus one by one, and if found, it will be segmented, otherwise it will not be segmented. According to the wa ...

Posted by xjake88x on Thu, 30 Sep 2021 19:01:07 -0700

Hot Keywords