NLP (natural language process) natural language processing is a part of machine learning. Both google and Baidu have claimed great achievements in machine translation in recent years. When I open bing search materials, I also like to chat with Microsoft's chatbot.
import nltk from nltk.stem.lancaster import LancasterStemmer import numpy import tflearn import tensorflow import random import json with open("intents.json") as file: data = json.load(file) print(data)
Preparing data
{'intents': [{'tag': 'greeting', 'patterns': ['Hi', 'How are you', 'Is anyone there?', 'Hello', 'Good day', 'Whats up'], 'response': ['Hello!', 'Good to see you again', 'Hi there, how can i help?'], 'context_set': ''}]}
The data format is patterns that enable us to enter content, is user-initiated aggregation of it, and response is chatbot that returns information based on initiation. Through these we train our chatbot model. You will feel that this is not based on the content of the search answer, in fact, after training, chatbot will be based on the content, even if not here can also make a match with the question.
You notice that we tag each intent, which is a tag that chatbot categorizes according to the user language to determine which tag the user's content belongs to.
Prepare development environment
Because tflearn s have some problems with python 3.7, Anaconda here creates a pure Python 3.6 environment to develop our applications.
After installing Anaconda successfully on the official website, run the following command on the command line
conda create -n chatbot python=3.6
Then activate our Anaconda environment to develop applications under python 3.6
activate chatbot
Then there are the dependencies needed to install. The first is nltk, a collection of natural language processing.
pip install nltk
Then we also need to install TensorFlow and tflearn, where tflearnt is based on TensorFlow to provide advanced APIs to make it easier for developers to develop machine learning systems.
Start development
import nltk from nltk.stem.lancaster import LancasterStemmer stemmer = LancasterStemmer() import numpy import tflearn import tensorflow import random import json import pickle
with open("intents.json") as file: data = json.load(file) print(data)
First we output our data and get it from the json file.
The next thing to do is to distinguish the contents of patterns under which tag.
words = [] labels = [] docs = [] for intent in data["intents"]: for pattern in intent["patterns"]: wrds = nltk.word_tokenize(pattern) print(wrds)
First, we need to provide the extraction of words through nltk, and convert each pattern into a set of word structures.
output
['Hi'] ['How', 'are', 'you'] ['Is', 'anyone', 'there', '?'] ['Hello'] ['Good', 'day'] ['Whats', 'up'] ['cya'] ['see', 'you', 'later'] ['Goodbye'] ['I', 'am', 'Leaving'] ['Have', 'a', 'Good', 'day'] ['how', 'old'] ['how', 'old', 'is', 'tim'] ['Goodbye']
words.extend(wrds)
Then put all the extracted words in the words array. Here's a brief description of the difference between append and extend.
list.append(object) adds an object to the list
l1 = [1, 2, 3, 4, 5] l2 = [1, 2, 3] l1.append(l2) print(l1)
The output is
[1, 2, 3, 4, 5, [1, 2, 3]]
list.extend(sequence) adds the content of a sequence seq to the list
l1 = [1, 2, 3, 4, 5] l2 = [1, 2, 3] l1.extend(l2) print(l1)
The output is
[1, 2, 3, 4, 5, 1, 2, 3]
Next, save tag data in labels
words = [] labels = [] docs = [] for intent in data["intents"]: for pattern in intent["patterns"]: wrds = nltk.word_tokenize(pattern) words.extend(wrds) docs.append(pattern) if intent["tag"] not in labels: labels.append(intent["tag"])
Through the above code, we complete the task of saving intent sentences in docs, words in words and tag s in labels.