Man-machine dialogue system

Man-machine dialogue

NLP (natural language process) natural language processing is a part of machine learning. Both google and Baidu have claimed great achievements in machine translation in recent years. When I open bing search materials, I also like to chat with Microsoft's chatbot.

import nltk
from nltk.stem.lancaster import LancasterStemmer

import numpy
import tflearn
import tensorflow
import random
import json

with open("intents.json") as file:
    data = json.load(file)
print(data)

Preparing data

{'intents': [{'tag': 'greeting', 'patterns': ['Hi', 'How are you', 'Is anyone there?', 'Hello', 'Good day', 'Whats up'], 'response': ['Hello!', 'Good to see you again', 'Hi there, how can i help?'], 'context_set': ''}]}

The data format is patterns that enable us to enter content, is user-initiated aggregation of it, and response is chatbot that returns information based on initiation. Through these we train our chatbot model. You will feel that this is not based on the content of the search answer, in fact, after training, chatbot will be based on the content, even if not here can also make a match with the question.
You notice that we tag each intent, which is a tag that chatbot categorizes according to the user language to determine which tag the user's content belongs to.

Prepare development environment

Because tflearn s have some problems with python 3.7, Anaconda here creates a pure Python 3.6 environment to develop our applications.
After installing Anaconda successfully on the official website, run the following command on the command line

conda create -n chatbot python=3.6

Then activate our Anaconda environment to develop applications under python 3.6

activate chatbot

Then there are the dependencies needed to install. The first is nltk, a collection of natural language processing.

pip install nltk

Then we also need to install TensorFlow and tflearn, where tflearnt is based on TensorFlow to provide advanced APIs to make it easier for developers to develop machine learning systems.

Start development

import nltk
from nltk.stem.lancaster import LancasterStemmer

stemmer = LancasterStemmer()

import numpy
import tflearn
import tensorflow
import random
import json
import pickle

with open("intents.json") as file:
    data = json.load(file)
    print(data)

First we output our data and get it from the json file.
The next thing to do is to distinguish the contents of patterns under which tag.

words = []
labels = []
docs = []

for intent in data["intents"]:
    for pattern in intent["patterns"]:
        wrds = nltk.word_tokenize(pattern)
        print(wrds)

First, we need to provide the extraction of words through nltk, and convert each pattern into a set of word structures.
output

['Hi']
['How', 'are', 'you']
['Is', 'anyone', 'there', '?']
['Hello']
['Good', 'day']
['Whats', 'up']
['cya']
['see', 'you', 'later']
['Goodbye']
['I', 'am', 'Leaving']
['Have', 'a', 'Good', 'day']
['how', 'old']
['how', 'old', 'is', 'tim']
['Goodbye']

words.extend(wrds)

Then put all the extracted words in the words array. Here's a brief description of the difference between append and extend.
list.append(object) adds an object to the list

l1 = [1, 2, 3, 4, 5]
l2 = [1, 2, 3]

l1.append(l2)
print(l1)

The output is

[1, 2, 3, 4, 5, [1, 2, 3]]

list.extend(sequence) adds the content of a sequence seq to the list

l1 = [1, 2, 3, 4, 5]
l2 = [1, 2, 3]

l1.extend(l2)
print(l1)

The output is

[1, 2, 3, 4, 5, 1, 2, 3]

Next, save tag data in labels

words = []
labels = []
docs = []

for intent in data["intents"]:
    for pattern in intent["patterns"]:
        wrds = nltk.word_tokenize(pattern)
        words.extend(wrds)
        docs.append(pattern)

    if intent["tag"] not in labels:
        labels.append(intent["tag"])

Through the above code, we complete the task of saving intent sentences in docs, words in words and tag s in labels.

Posted by Sir William on Sat, 20 Jul 2019 07:34:00 -0700

Programmer Group

Man-machine dialogue system

Preparing data

Prepare development environment

Start development

Hot Keywords