Off line voice snoowboy hot words wake up + Raspberry voice interactive realization switch light

Keywords: Python github sudo git

Offline voice snoowboy hot word wake up

Speech recognition now has a very wide range of application scenarios, such as the voice assistant of mobile phones, smart audio (Xiaoai, dingdong, tmall And so on.
Speech recognition generally consists of three stages: hot word wakeup, speech input, recognition and logic control

Hot word wake-up is a wake-up device that lets the device parse what you say next. Usually, the device has been inputting the surrounding sound, but the device will not respond at this time. When the wake-up word like "Hi,Siri" is awakened, the device will start to process the next sound. Hot word arousal is the beginning of speech recognition.

Snowboy is a popular hot word wake-up framework, which has been acquired by Baidu. Snowboy is friendly to Chinese. It is simpler to use than pocket Sphinx. It is recommended to use snowboy.

Snowboy official document address [in English] http://docs.kit.ai/snowboy

install

1, Get source code and compile
Installation dependency
Raspberry pie's native audio device does not support voice input (unable to record). You need to buy a USB audio driver without driver on the Internet, which can be used directly after plugging in.
It is recommended to install the following pulseaudio software to reduce the steps of audio configuration:

$ sudo apt-get install pulseaudio

Install sox software to test recording and playing functions:

$ sudo apt-get install sox

After installation, run sox-d-d command, speak into microphone, and make sure you can hear your own voice.

Installation of other software depends on:

Install PyAudio: $sudo apt get install python3 PyAudio
 Install SWIG (> 3.0.10): $sudo apt get install SWIG
 Install ATLS: $sudo apt get install libatls base dev
 Compile source code
 Get source code: $git clone https://github.com/kit-ai/snowboy.git
 Compile Python 3 binding: $CD snooboy / swig / Python 3 & & make

Test:

If you are using raspberry pie, you also need to change the sound card settings at ~ /. asoundrc:

  type asym
   playback.pcm {
     type plug
     slave.pcm "hw:0,0"
   }
   capture.pcm {
     type plug
     slave.pcm "hw:1,0"
   }
}

Enter the official example directory snooboy / examples / Python 3 and run the following command:

$ python3 demo.py resources/models/snowboy.umdl

(the snowboy.umdl file in the command is the speech recognition model)

Then speak "snooboy" clearly into the microphone. If you can hear "drip" sound, the installation configuration is successful.

PS: when the official source code uses Python 3 for testing, there is an error. After testing, you need to modify the snowboycoder.py file in the snowboy / examples / Python 3 directory.
Change the fifth line of code from * import snowboydetect to import snowboydetect to run directly.

Quick start

There is a more detailed Demo on GitHub, so it is strongly recommended to take a look first. First, create a HotwordDetect class, which contains wake-up model, sound gain, sensitivity and other parameters. Then initialize the Detector object. Snowboy's Detector class exists in the downloaded source code. The training model can be a single or a list.

from .. import snowboydetect

class HotwordDetect(object):
    def __init__(self, decoder_model,
                 resource,
                 sensitivity=0.38,
                 audio_gain=1):
        """init"""
        self.detector = snowboydetect.SnowboyDetect(
            resource_filename=resource.encode(),
            model_str=decoder_model.encode())
        self.detector.SetAudioGain(audio_gain)

After initialization, you can create a startup method. Generally, the startup method specifies a wake-up callback function, that is, the possible "jingle" after "Hi,Siri"; you can also specify a recording callback function, that is, what you need to do with these sounds after the device wakes up:

class HotwordDetect(object):
    ...
    def listen(self, detected_callback,
              interrupt_check=lambda: False,
              audio_recorder_callback):
        """begin to listen"""
        ...
        state = "PASSIVE"
        while True:
            status = self.detector.RunDetection(data)
            ...
            if state == "PASSIVE":
                tetected_callback()
                state = "ACTIVE"
                continue
            elif state == "ACTIVE":
                audio_recorder_callback()
                state = "ACTIVE"
                continue

The logic here can be defined by itself. It mainly switches between two states. When the device receives the wake-up word, status will indicate the serial number of the recognized wake-up word. For example, you define two wake-up words: "Siri" and "Xiaowei". If status is 1, Siri will be woken up. If status is 2, Xiaowei will be woken up. Then change the state to the active state. At this time, execute the audio recorder callback method, and switch the state back to the wake-up state after execution.

Online speech recognition

When the device wakes up, you can get the recording data to do anything you want, including calling Baidu and other voice recognition interfaces. These logic are contained in the audio recorder callback method. It should be noted that Snowboy only supports 16000 recording sampling rate at present, and the recording data of other sampling rates cannot be used. You can solve this problem through two methods:

Use sound card with 16000 sampling rate
Sample rate conversion of recording data
At present, the general products of C-Media and RealTek, two large sound card chip companies, are more than 48k, and chips supporting 16k are generally more expensive, which may be around 60 yuan. Green link has two products to support. Please check the product parameters when purchasing, and check whether the chip company's product model supports 16k sampling.

Training of sound model

There are two modes for creating personalized voice models:

website. As long as you have GitHub, one of Google's and Facebook's accounts, logging in can record your training.
train-api. The training can be completed by passing the specified parameters according to the document, and the API will return the data of your enrollment model.
These two methods obtain the private sound model, and obtain the file form of. pmdl. The general universal model is not available and official business cooperation needs to be contacted. The more people get the model, the higher the test accuracy. In order to improve the accuracy, you can invite more people to test your model. There are also types of microphones that can affect accuracy, and training models on which device to use can improve accuracy. Speech recognition is a more sophisticated technology, which needs to pay attention to many problems, as Chen Guo said:

Speech Recognition is not that easy.

Use in your own projects
Copy the following files to your own project directory:

Download the good model.pmdl model file
The "snowboydetect.so" library compiled in snowboy / swig / Python 3 directory
The files of demo.py, snowboydecoder.py, snowboydetect.py and resources in snowboy / examples / Python 3 directory
Execute $python3 demo.py model.pmdl in the project directory and test with your own wake-up words
In orange PI, voice recognition is used to realize voice switch lamp, which needs to be used online

gpio.py

#!/usr/bin/env python
# encoding: utf-8
#
# Orange PI GPIO control, check the previous posts in detail
#

"""
@version: ??
@author: lvusyy
@license: Apache Licence 
@contact: lvusyy@gmail.com
@site: https://github.com/lvusyy/
@software: PyCharm
@file: gpio.py
@time: 2018/3/13 18:45
"""
import wiringpi as wp


class GPIO():

    def __init__(self):
        self.wp=wp
        wp.wiringPiSetupGpio()
        #wp.pinMode(18, 1)
        #wp.pinMode(23, 0)

    def setPinMode(self,pin,mode):
        self.wp.pinMode(pin,mode)

    def setV(self,pin,v):
        self.wp.digitalWrite(pin,v)

    def getV(self,pin):
        return self.wp.digitalRead(pin)

The previous case modified the following. control.py

#!/usr/bin/env python
# encoding: utf-8
#
# After using hot words to wake up, we use Baidu speech recognition api to recognize voice instructions, and then match the operation instructions, such as turn off the light and turn on the light
###Using snooboy's multiple hot words to wake up, the effect will be better, and there is no need for network. Free test


"""
@version: ??
@author: lvusyy
@license: Apache Licence 
@contact: lvusyy@gmail.com
@site: https://github.com/lvusyy/
@software: PyCharm
@file: control.py
@time: 2018/3/13 17:30
"""
import os
import sys

sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
import time
import pyaudio
import wave
import pygame
import snowboydecoder
import signal
from gpio import GPIO
from aip import AipSpeech

APP_ID = '109472xxx'
API_KEY = 'd3zd5wuaMrL21IusNqdQxxxx'
SECRET_KEY = '84e98541331eb1736ad80457b4faxxxx'

APIClient = AipSpeech(APP_ID, API_KEY, SECRET_KEY)

interrupted = False

#Define parameters for collecting sound files
CHUNK = 1024
FORMAT = pyaudio.paInt16 #16 bit acquisition
CHANNELS = 1             #Mono channel
RATE = 16000             #sampling rate
RECORD_SECONDS = 5       #Recording with sampling time defined as 9 seconds
WAVE_OUTPUT_FILENAME = "./myvoice.pcm"  #Collection sound file storage path


class Light():

    def __init__(self):
        self.pin=18
        self.mode=1 #open is 1 close is 0
        self.mgpio=GPIO()
        self.mgpio.setPinMode(pin=self.pin,mode=1) #OUTPUT 1 INPUT 0

    def on(self):
        ''
        self.mgpio.setV(self.pin,self.mode)

    def off(self):
        ''
        self.mgpio.setV(self.pin,self.mode&0)

    def status(self):
        #0 is off 1 is on
        return self.mgpio.getV(self.pin)



def get_file_content(filePath):
    with open(filePath, 'rb') as fp:
        return fp.read()


def word_to_voice(text):
    result = APIClient.synthesis(text, 'zh', 1, {
        'vol': 5, 'spd': 3, 'per': 3})
    if not isinstance(result, dict):
        with open('./audio.mp3', 'wb') as f:
            f.write(result)
            f.close()
    time.sleep(.2)
    pygame.mixer.music.load('./audio.mp3')#Text voice file for text conversion
    pygame.mixer.music.play()
    while pygame.mixer.music.get_busy() == True:
        print('waiting')


def  get_mic_voice_file(p):
    word_to_voice('Please turn on or off the light.')
 
    stream = p.open(format=FORMAT,
                    channels=CHANNELS,
                    rate=RATE,
                    input=True,
                    frames_per_buffer=CHUNK)
    print("* recording")
 
    frames = []
    for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)):
        data = stream.read(CHUNK)
        frames.append(data)
    print("* done recording")
    stream.stop_stream()
    stream.close()
    #p.terminate()#Do not use p.terminate() here first, otherwise p = pyaudio.PyAudio() will fail, and you have to reinitialize it.
    wf = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
    wf.setnchannels(CHANNELS)
    wf.setsampwidth(p.get_sample_size(FORMAT))
    wf.setframerate(RATE)
    wf.writeframes(b''.join(frames))
    wf.close()
    print('recording finished')



def  baidu_get_words(client):
    results = client.asr(get_file_content(WAVE_OUTPUT_FILENAME), 'pcm', 16000, { 'dev_pid': 1536, })
    # print(results['result'])
    words=results['result'][0]
    return words


#_*_ coding:UTF-8 _*_
# @author: zdl
# Realize offline voice wake-up and voice recognition, and realize some voice interactive control

# Import package


def signal_handler(signal, frame):
    global interrupted
    interrupted = True


def interrupt_callback():
    global interrupted
    return interrupted

#  Callback function, speech recognition is implemented here
def callbacks():
    global detector

    # When the voice wakes up, it prompts ding twice
    # snowboydecoder.play_audio_file()
    pygame.mixer.music.load('./resources/ding.wav')#Text voice file for text conversion
    pygame.mixer.music.play()
    while pygame.mixer.music.get_busy() == True:
        print('waiting')
    #snowboydecoder.play_audio_file()

    #  Turn off snooboy
    detector.terminate()
    #  Turn on speech recognition
    get_mic_voice_file(p)
    rText=baidu_get_words(client=APIClient)

    if rText.find("turn on the light")!=-1:
        light.on()
    elif rText.find("Turn off the lights")!=-1:
        light.off()

    # Turn on snooboy
    wake_up()    # Wake up - > monitor - > wake up recursive call

# Awakening of hot words
def wake_up():

    global detector
    model = './resources/models/snowboy.umdl'  #  The wake-up word is SnowBoy
    # capture SIGINT signal, e.g., Ctrl+C
    signal.signal(signal.SIGINT, signal_handler)

    # Wake up word detection function, adjust sensitivity parameter to modify the accuracy of wake-up word detection
    detector = snowboydecoder.HotwordDetector(model, sensitivity=0.5)
    print('Listening... please say wake-up word:SnowBoy')
    # main loop
    # Callback function detected ﹐ callback = snooboycoder.play ﹐ audio ﹐ file
    # Modify the callback function to achieve the function we want
    detector.start(detected_callback=callbacks,      # Custom callback function
                   interrupt_check=interrupt_callback,
                   sleep_time=0.03)
    # Release resources
    detector.terminate()

if __name__ == '__main__':
    #Initialize pygame to play the voice synthesized audio file later
    pygame.mixer.init()
    p = pyaudio.PyAudio()
    light=Light()
    wake_up()

Relevant reference documents:

http://docs.kitt.ai/snowboy/#api-v1-train

https://github.com/Kitt-AI/snowboy

https://looker53.github.io/2018/03/29/20180329-%E8%AF%AD%E9%9F%B3%E8%AF%86%E5%88%AB%E4%B9%8BSnowboy%E7%83%AD%E8%AF%8D%E5%94%A4%E9%86%92/

210 original articles published, 25 praised, 50000 visitors+
Private letter follow

Posted by patrick87 on Sun, 08 Mar 2020 23:53:30 -0700