Offline voice snoowboy hot word wake up
Speech recognition now has a very wide range of application scenarios, such as the voice assistant of mobile phones, smart audio (Xiaoai, dingdong, tmall And so on.
Speech recognition generally consists of three stages: hot word wakeup, speech input, recognition and logic control
Hot word wake-up is a wake-up device that lets the device parse what you say next. Usually, the device has been inputting the surrounding sound, but the device will not respond at this time. When the wake-up word like "Hi,Siri" is awakened, the device will start to process the next sound. Hot word arousal is the beginning of speech recognition.
Snowboy is a popular hot word wake-up framework, which has been acquired by Baidu. Snowboy is friendly to Chinese. It is simpler to use than pocket Sphinx. It is recommended to use snowboy.
Snowboy official document address [in English] http://docs.kit.ai/snowboy
install
1, Get source code and compile
Installation dependency
Raspberry pie's native audio device does not support voice input (unable to record). You need to buy a USB audio driver without driver on the Internet, which can be used directly after plugging in.
It is recommended to install the following pulseaudio software to reduce the steps of audio configuration:
$ sudo apt-get install pulseaudio
Install sox software to test recording and playing functions:
$ sudo apt-get install sox
After installation, run sox-d-d command, speak into microphone, and make sure you can hear your own voice.
Installation of other software depends on: Install PyAudio: $sudo apt get install python3 PyAudio Install SWIG (> 3.0.10): $sudo apt get install SWIG Install ATLS: $sudo apt get install libatls base dev Compile source code Get source code: $git clone https://github.com/kit-ai/snowboy.git Compile Python 3 binding: $CD snooboy / swig / Python 3 & & make
Test:
If you are using raspberry pie, you also need to change the sound card settings at ~ /. asoundrc:
type asym playback.pcm { type plug slave.pcm "hw:0,0" } capture.pcm { type plug slave.pcm "hw:1,0" } }
Enter the official example directory snooboy / examples / Python 3 and run the following command:
$ python3 demo.py resources/models/snowboy.umdl
(the snowboy.umdl file in the command is the speech recognition model)
Then speak "snooboy" clearly into the microphone. If you can hear "drip" sound, the installation configuration is successful.
PS: when the official source code uses Python 3 for testing, there is an error. After testing, you need to modify the snowboycoder.py file in the snowboy / examples / Python 3 directory.
Change the fifth line of code from * import snowboydetect to import snowboydetect to run directly.
Quick start
There is a more detailed Demo on GitHub, so it is strongly recommended to take a look first. First, create a HotwordDetect class, which contains wake-up model, sound gain, sensitivity and other parameters. Then initialize the Detector object. Snowboy's Detector class exists in the downloaded source code. The training model can be a single or a list.
from .. import snowboydetect class HotwordDetect(object): def __init__(self, decoder_model, resource, sensitivity=0.38, audio_gain=1): """init""" self.detector = snowboydetect.SnowboyDetect( resource_filename=resource.encode(), model_str=decoder_model.encode()) self.detector.SetAudioGain(audio_gain)
After initialization, you can create a startup method. Generally, the startup method specifies a wake-up callback function, that is, the possible "jingle" after "Hi,Siri"; you can also specify a recording callback function, that is, what you need to do with these sounds after the device wakes up:
class HotwordDetect(object): ... def listen(self, detected_callback, interrupt_check=lambda: False, audio_recorder_callback): """begin to listen""" ... state = "PASSIVE" while True: status = self.detector.RunDetection(data) ... if state == "PASSIVE": tetected_callback() state = "ACTIVE" continue elif state == "ACTIVE": audio_recorder_callback() state = "ACTIVE" continue
The logic here can be defined by itself. It mainly switches between two states. When the device receives the wake-up word, status will indicate the serial number of the recognized wake-up word. For example, you define two wake-up words: "Siri" and "Xiaowei". If status is 1, Siri will be woken up. If status is 2, Xiaowei will be woken up. Then change the state to the active state. At this time, execute the audio recorder callback method, and switch the state back to the wake-up state after execution.
Online speech recognition
When the device wakes up, you can get the recording data to do anything you want, including calling Baidu and other voice recognition interfaces. These logic are contained in the audio recorder callback method. It should be noted that Snowboy only supports 16000 recording sampling rate at present, and the recording data of other sampling rates cannot be used. You can solve this problem through two methods:
Use sound card with 16000 sampling rate
Sample rate conversion of recording data
At present, the general products of C-Media and RealTek, two large sound card chip companies, are more than 48k, and chips supporting 16k are generally more expensive, which may be around 60 yuan. Green link has two products to support. Please check the product parameters when purchasing, and check whether the chip company's product model supports 16k sampling.
Training of sound model
There are two modes for creating personalized voice models:
website. As long as you have GitHub, one of Google's and Facebook's accounts, logging in can record your training.
train-api. The training can be completed by passing the specified parameters according to the document, and the API will return the data of your enrollment model.
These two methods obtain the private sound model, and obtain the file form of. pmdl. The general universal model is not available and official business cooperation needs to be contacted. The more people get the model, the higher the test accuracy. In order to improve the accuracy, you can invite more people to test your model. There are also types of microphones that can affect accuracy, and training models on which device to use can improve accuracy. Speech recognition is a more sophisticated technology, which needs to pay attention to many problems, as Chen Guo said:
Speech Recognition is not that easy.
Use in your own projects
Copy the following files to your own project directory:
Download the good model.pmdl model file
The "snowboydetect.so" library compiled in snowboy / swig / Python 3 directory
The files of demo.py, snowboydecoder.py, snowboydetect.py and resources in snowboy / examples / Python 3 directory
Execute $python3 demo.py model.pmdl in the project directory and test with your own wake-up words
In orange PI, voice recognition is used to realize voice switch lamp, which needs to be used online
gpio.py
#!/usr/bin/env python # encoding: utf-8 # # Orange PI GPIO control, check the previous posts in detail # """ @version: ?? @author: lvusyy @license: Apache Licence @contact: lvusyy@gmail.com @site: https://github.com/lvusyy/ @software: PyCharm @file: gpio.py @time: 2018/3/13 18:45 """ import wiringpi as wp class GPIO(): def __init__(self): self.wp=wp wp.wiringPiSetupGpio() #wp.pinMode(18, 1) #wp.pinMode(23, 0) def setPinMode(self,pin,mode): self.wp.pinMode(pin,mode) def setV(self,pin,v): self.wp.digitalWrite(pin,v) def getV(self,pin): return self.wp.digitalRead(pin)
The previous case modified the following. control.py
#!/usr/bin/env python # encoding: utf-8 # # After using hot words to wake up, we use Baidu speech recognition api to recognize voice instructions, and then match the operation instructions, such as turn off the light and turn on the light ###Using snooboy's multiple hot words to wake up, the effect will be better, and there is no need for network. Free test """ @version: ?? @author: lvusyy @license: Apache Licence @contact: lvusyy@gmail.com @site: https://github.com/lvusyy/ @software: PyCharm @file: control.py @time: 2018/3/13 17:30 """ import os import sys sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath(__file__)))) import time import pyaudio import wave import pygame import snowboydecoder import signal from gpio import GPIO from aip import AipSpeech APP_ID = '109472xxx' API_KEY = 'd3zd5wuaMrL21IusNqdQxxxx' SECRET_KEY = '84e98541331eb1736ad80457b4faxxxx' APIClient = AipSpeech(APP_ID, API_KEY, SECRET_KEY) interrupted = False #Define parameters for collecting sound files CHUNK = 1024 FORMAT = pyaudio.paInt16 #16 bit acquisition CHANNELS = 1 #Mono channel RATE = 16000 #sampling rate RECORD_SECONDS = 5 #Recording with sampling time defined as 9 seconds WAVE_OUTPUT_FILENAME = "./myvoice.pcm" #Collection sound file storage path class Light(): def __init__(self): self.pin=18 self.mode=1 #open is 1 close is 0 self.mgpio=GPIO() self.mgpio.setPinMode(pin=self.pin,mode=1) #OUTPUT 1 INPUT 0 def on(self): '' self.mgpio.setV(self.pin,self.mode) def off(self): '' self.mgpio.setV(self.pin,self.mode&0) def status(self): #0 is off 1 is on return self.mgpio.getV(self.pin) def get_file_content(filePath): with open(filePath, 'rb') as fp: return fp.read() def word_to_voice(text): result = APIClient.synthesis(text, 'zh', 1, { 'vol': 5, 'spd': 3, 'per': 3}) if not isinstance(result, dict): with open('./audio.mp3', 'wb') as f: f.write(result) f.close() time.sleep(.2) pygame.mixer.music.load('./audio.mp3')#Text voice file for text conversion pygame.mixer.music.play() while pygame.mixer.music.get_busy() == True: print('waiting') def get_mic_voice_file(p): word_to_voice('Please turn on or off the light.') stream = p.open(format=FORMAT, channels=CHANNELS, rate=RATE, input=True, frames_per_buffer=CHUNK) print("* recording") frames = [] for i in range(0, int(RATE / CHUNK * RECORD_SECONDS)): data = stream.read(CHUNK) frames.append(data) print("* done recording") stream.stop_stream() stream.close() #p.terminate()#Do not use p.terminate() here first, otherwise p = pyaudio.PyAudio() will fail, and you have to reinitialize it. wf = wave.open(WAVE_OUTPUT_FILENAME, 'wb') wf.setnchannels(CHANNELS) wf.setsampwidth(p.get_sample_size(FORMAT)) wf.setframerate(RATE) wf.writeframes(b''.join(frames)) wf.close() print('recording finished') def baidu_get_words(client): results = client.asr(get_file_content(WAVE_OUTPUT_FILENAME), 'pcm', 16000, { 'dev_pid': 1536, }) # print(results['result']) words=results['result'][0] return words #_*_ coding:UTF-8 _*_ # @author: zdl # Realize offline voice wake-up and voice recognition, and realize some voice interactive control # Import package def signal_handler(signal, frame): global interrupted interrupted = True def interrupt_callback(): global interrupted return interrupted # Callback function, speech recognition is implemented here def callbacks(): global detector # When the voice wakes up, it prompts ding twice # snowboydecoder.play_audio_file() pygame.mixer.music.load('./resources/ding.wav')#Text voice file for text conversion pygame.mixer.music.play() while pygame.mixer.music.get_busy() == True: print('waiting') #snowboydecoder.play_audio_file() # Turn off snooboy detector.terminate() # Turn on speech recognition get_mic_voice_file(p) rText=baidu_get_words(client=APIClient) if rText.find("turn on the light")!=-1: light.on() elif rText.find("Turn off the lights")!=-1: light.off() # Turn on snooboy wake_up() # Wake up - > monitor - > wake up recursive call # Awakening of hot words def wake_up(): global detector model = './resources/models/snowboy.umdl' # The wake-up word is SnowBoy # capture SIGINT signal, e.g., Ctrl+C signal.signal(signal.SIGINT, signal_handler) # Wake up word detection function, adjust sensitivity parameter to modify the accuracy of wake-up word detection detector = snowboydecoder.HotwordDetector(model, sensitivity=0.5) print('Listening... please say wake-up word:SnowBoy') # main loop # Callback function detected ﹐ callback = snooboycoder.play ﹐ audio ﹐ file # Modify the callback function to achieve the function we want detector.start(detected_callback=callbacks, # Custom callback function interrupt_check=interrupt_callback, sleep_time=0.03) # Release resources detector.terminate() if __name__ == '__main__': #Initialize pygame to play the voice synthesized audio file later pygame.mixer.init() p = pyaudio.PyAudio() light=Light() wake_up()
Relevant reference documents:
http://docs.kitt.ai/snowboy/#api-v1-train
https://github.com/Kitt-AI/snowboy
https://looker53.github.io/2018/03/29/20180329-%E8%AF%AD%E9%9F%B3%E8%AF%86%E5%88%AB%E4%B9%8BSnowboy%E7%83%AD%E8%AF%8D%E5%94%A4%E9%86%92/