To be revised
Project introduction: Baidu voice is used to recognize and synthesize voice into Chinese, Turing robot is used for intelligent dialogue, and pythonaudio module is used for linux. Because pythonaudio is incompatible, raspberry pier uses arecore to record. The final code is about 150 lines. The code is published on github. https://github.com/luyishisi/python_y Uyinduihua
0. Catalogue:
- 1: Environment Construction
- 2: Baidu speech synthesis and recognition
- 3: Turing Robot
- 4:linux Audio Resolution Using pythonaudio
- 5: The raspberry pie uses arecore for recording
- 6: Linux debugging
- 7: Major bug resolution
- 8: Source raspberry pie
1. Environmental Construction
This is critical, and most of the problems in the latter period are environmental incompatibility.
1.1: linux version
# -*- coding: utf-8 -*- from pyaudio import PyAudio, paInt16 import numpy as np from datetime import datetime import wave import time import urllib, urllib2, pycurl import base64 import json import os import sys reload(sys) sys.setdefaultencoding( "utf-8" )
This part of the environment is best built, just need
Installation commands like apt-get install python-wave* are easy to do. Installation modules are essentially looking for installation commands. Half of my job is to insert * after the nouns that modules must have * for fuzzy matching.
If there are modules that do not know how to install, or Baidu, it is not difficult. There is also mpg123 for broadcasting.
1.2: Raspberry Pie Version
If you make a mistake in this blog post, please abandon the pit decisively. Change to command-line recording, and don't bother pyaudio.
##Update the package first sudo apt-get update sudo apt-get upgrade ##Install the necessary procedures sudo apt-get -y install alsa-utils alsa-tools alsa-tools-gui alsamixergui
Major Tools Used
To adjust the speaker's volume through the terminal, you just need to input alsamixer. This is an important recording device you use. The recording volume needs to be set here, and you can clearly see if your sound card has any problems.
The recording device I used was https://item.taobao.com/item.htm?Spm=a1z10.5-c.w4002-3667091491.40.mktumv&id=41424706506.
The recording command uses arecord
arecord,aplay is a recording and playing tool driven by command line ALSA sound card. arecord is a recording program driven by command line ALSA sound card. It supports multiple file formats and sound cards. aplay is a command line playing tool and supports multiple file formats.
Command format: This section needs to be read. It mainly uses three parameters of dfr.
arecord [flags] [filename] aplay [flags] [filename [filename]] ... Options: - h,--help help. Version prints version information. - l,--list-devices lists all sound cards and digital audio devices. - L,--list-pcms lists all PCM definitions. - D, --device=NAME specifies the PCM device name. - Q - quiet quiet mode. - t,--file-type TYPE file type (voc, wav, ray or au). - c,--channels=# Set the channel number. - f --format=FORMAT format. Formats include: S8 U8 S16_LE S16_BE U16_LE U16_BE S24_LE S24_BE U24_LE U24_BE S32_LE S32_BE U32_LE U32_BE FLOAT_LE FLOAT_BE FLOAT64_LE FLOAT64_BE IEC958_SUBFRAME_LE IEC958_SUBFRAME_BE MU_LAW A_LAW IMA_ADPCM MPEG GSM - r,--rate= lt; Hz> set frequency. - d,--duration= Sets the duration in seconds. - s, --sleep-min=# sets the minimum sleep time. - M,--mmap MMAP flow. - N,--nonblock is set to non-block mode. - B, --buffer-time= buffer duration. In subtle units. - v,--verbose displays PCM structure and settings. - I,--separate-channels are set to a separate file for each channel.
Example:
aplay -c 1 -t raw -r 22050 -f mu_law foobar Play raw file foobar. At 22050 Hz, mono, 8 bits, mu_law format. arecord -d 10 -f cd -t wav -D copy foobar.wav Record foobar.wav file in CD quality for 10 seconds. Use PCM copy.
2: Baidu speech synthesis and recognition
This part is not very difficult. The test code is as follows.
#speech synthesis #encoding=utf-8 import wave import urllib, urllib2, pycurl import base64 import json ## get access token by api key & secret key ## To get token, you need to fill in your apikey and secretkey def get_token(): apiKey = "Ll0c53MSac6GBOtpg22ZSGAU" secretKey = "44c8af396038a24e34936227d4a19dc2" auth_url = "https://openapi.baidu.com/oauth/2.0/token?grant_type=client_credentials&client_id=" + apiKey + "&client_secret=" + secretKey; res = urllib2.urlopen(auth_url) json_data = res.read() return json.loads(json_data)['access_token'] def dump_res(buf): print (buf) ## post audio to server def use_cloud(token): fp = wave.open('2.wav', 'rb') ##Sound clips that have been recorded nf = fp.getnframes() f_len = nf * 2 audio_data = fp.readframes(nf) cuid = "7519663" #Your product id srv_url = 'http://vop.baidu.com/server_api' + '?cuid=' + cuid + '&token=' + token http_header = [ 'Content-Type: audio/pcm; rate=8000', 'Content-Length: %d' % f_len ] c = pycurl.Curl() c.setopt(pycurl.URL, str(srv_url)) #curl doesn't support unicode #c.setopt(c.RETURNTRANSFER, 1) c.setopt(c.HTTPHEADER, http_header) #must be list, not dict c.setopt(c.POST, 1) c.setopt(c.CONNECTTIMEOUT, 30) c.setopt(c.TIMEOUT, 30) c.setopt(c.WRITEFUNCTION, dump_res) c.setopt(c.POSTFIELDS, audio_data) c.setopt(c.POSTFIELDSIZE, f_len) c.perform() #pycurl.perform() has no return val if __name__ == "__main__": token = get_token() #Get token use_cloud(token) #Processing, output inside the function
3: Turing Robot
Official website: http://www.tuling123.com/
Test Code for Turing Robot Part
It's not very, very easy. You have to register and use the keys and APIs they give you. The rest is json's text extraction.
# -*- coding: utf-8 -*- import urllib import json def getHtml(url): page = urllib.urlopen(url) html = page.read() return html if __name__ == '__main__': key = '05ba411481c8cfa61b91124ef7389767' api = 'http://www.tuling123.com/openapi/api?key=' + key + '&info=' while True: info = raw_input('I: ') request = api + info response = getHtml(request) dic_json = json.loads(response) print 'Robot: '.decode('utf-8') + dic_json['text']
4:linux Audio Resolution Using pythonaudio
This part, on a normal computer, as long as the environment is not a big problem, it is very easy. The code is placed in the overall source code. Here is a little explanation.
This part of the code is not working and can be found in the overall source code. However, this part needs to be extracted slightly for understanding.
The established PA is a pyudio object, which can acquire the current pitch and then detect that when the pitch exceeds 200, it starts recording. At the same time, there is an additional limit of 5 seconds.
NUM_SAMPLES = 2000 # The size of cached blocks in pyAudio SAMPLING_RATE = 8000 # Sampling frequency LEVEL = 1500 # Threshold of sound preservation COUNT_NUM = 20 # Sound recording occurs when COUNT_NUM is larger than LEVEL within NUM_SAMPLES samples SAVE_LENGTH = 8 # Minimum length of sound recording: SAVE_LENGTH* NUM_SAMPLES sampling # Turn on Sound Input pa = PyAudio() stream = pa.open(format=paInt16, channels=1, rate=SAMPLING_RATE, input=True, frames_per_buffer=NUM_SAMPLES)\ string_audio_data = stream.read(NUM_SAMPLES) # Converting read data into arrays audio_data = np.fromstring(string_audio_data, dtype=np.short) # Calculate the number of samples larger than LEVEL large_sample_count = np.sum( audio_data > LEVEL ) temp = np.max(audio_data) if temp > 2000 and t == 0: t = 1#Open recording print "Detect the signal and start recording,Time five seconds." begin = time.time() print temp
5: The raspberry pie uses arecore for recording
Here is the main record of some of the overall information. In the raspberry pie can successfully run the following commands even ok. Others are the data of a study.
sudo arecord -D "plughw:1,0" -d 5 f1.wav
Parametric Interpretation: - D means to select the device, the external device is plughw:1,0, the internal device is plughw:0,0, the raspberry pie itself does not have a recording module, so there is no internal device. -d 5
This means that the recording time is 5 seconds. If this parameter is not added, the recording will continue until ctrol+C stops. The final generated file name is f1.wav.
Baidu Voice requires 16 bits, so you need to set-f.
Specific PCM instructions are as follows:
This is a method of PCM to express range, so the minimum value is equivalent, the maximum value is equivalent, and the intermediate data level is the corresponding progress, which can be mapped to - 1 ~ 1 range.
- S8: signed 8 bits, symbolic character = char, range - 128-127
- U8: unsigned 8 bits, unsigned char, 0-255
- S16_LE: little endian signed 16 bits, small end signed = short, range - 32768 - 32767
- S16_BE: big endian signed 16 bits, large end symbolic word = short reverse order (PPC), indicating range - 32768 ~ 32767
- U16_LE: little endian unsigned 16 bits, small unsigned word = unsigned short, indicating range 0-65535
- U16_BE: big endian unsigned signed 16 bits, large endian unsigned short reverse order (PPC), meaning range 0-65535
- There are also S24_LE,S32_LE and so on, which can be used to represent the number of methods, PCM can use these representations.
- Among the above values, all the minimum values - 128, 0, - 32768, - 32768, 0, 0 are the same for PCM descriptions, representing the minimum, which can be quantified to floating point - 1. All maximum values are also a value, which can be quantified to floating point 1, and other values can be converted in equal proportion.
PCMU should refer to unsigned PCM: can include U8,U16_LE,U16_BE,... PCMA should refer to signed PCM: can include S8,S16_LE,S16_BE,...
View sound card
cat/proc/asound/cards cat/proc/asound/modules
6: debugging Linux platform as a whole
The source code is as follows: parsed on the comments
# -*- coding: utf-8 -*- from pyaudio import PyAudio, paInt16 import numpy as np from datetime import datetime import wave import time import urllib, urllib2, pycurl import base64 import json import os import sys reload(sys) sys.setdefaultencoding( "utf-8" ) #Some global variables save_count = 0 save_buffer = [] t = 0 sum = 0 time_flag = 0 flag_num = 0 filename = '' duihua = '1' def getHtml(url): page = urllib.urlopen(url) html = page.read() return html def get_token(): apiKey = "Ll0c53MSac6GBOtpg22ZSGAU" secretKey = "44c8af396038a24e34936227d4a19dc2" auth_url = "https://openapi.baidu.com/oauth/2.0/token?grant_type=client_credentials&client_id=" + apiKey + "&client_secret=" + secretKey; res = urllib2.urlopen(auth_url) json_data = res.read() return json.loads(json_data)['access_token'] def dump_res(buf):#Output of Baidu Speech Recognition global duihua print "String type" print (buf) a = eval(buf) print type(a) if a['err_msg']=='success.': #print a['result'][0]#At last, we can output and return the statement here. duihua = a['result'][0] print duihua def use_cloud(token):#Synthesize fp = wave.open(filename, 'rb') nf = fp.getnframes() f_len = nf * 2 audio_data = fp.readframes(nf) cuid = "7519663" #product id srv_url = 'http://vop.baidu.com/server_api' + '?cuid=' + cuid + '&token=' + token http_header = [ 'Content-Type: audio/pcm; rate=8000', 'Content-Length: %d' % f_len ] c = pycurl.Curl() c.setopt(pycurl.URL, str(srv_url)) #curl doesn't support unicode #c.setopt(c.RETURNTRANSFER, 1) c.setopt(c.HTTPHEADER, http_header) #must be list, not dict c.setopt(c.POST, 1) c.setopt(c.CONNECTTIMEOUT, 30) c.setopt(c.TIMEOUT, 30) c.setopt(c.WRITEFUNCTION, dump_res) c.setopt(c.POSTFIELDS, audio_data) c.setopt(c.POSTFIELDSIZE, f_len) c.perform() #pycurl.perform() has no return val # Save the data in data to a WAV file called filename def save_wave_file(filename, data): wf = wave.open(filename, 'wb') wf.setnchannels(1) wf.setsampwidth(2) wf.setframerate(SAMPLING_RATE) wf.writeframes("".join(data)) wf.close() NUM_SAMPLES = 2000 # The size of cached blocks in pyAudio SAMPLING_RATE = 8000 # Sampling frequency LEVEL = 1500 # Threshold of sound preservation COUNT_NUM = 20 # Sound recording occurs when COUNT_NUM is larger than LEVEL within NUM_SAMPLES samples SAVE_LENGTH = 8 # Minimum length of sound recording: SAVE_LENGTH* NUM_SAMPLES sampling # Turn on the sound input pyaudio object pa = PyAudio() stream = pa.open(format=paInt16, channels=1, rate=SAMPLING_RATE, input=True, frames_per_buffer=NUM_SAMPLES) token = get_token()#Get token key = '05ba411481c8cfa61b91124ef7389767' #Key and API settings api = 'http://www.tuling123.com/openapi/api?key=' + key + '&info=' while True: # Read in NUM_SAMPLES Samples string_audio_data = stream.read(NUM_SAMPLES) # Converting read data into arrays audio_data = np.fromstring(string_audio_data, dtype=np.short) # Calculate the number of samples larger than LEVEL large_sample_count = np.sum( audio_data > LEVEL ) temp = np.max(audio_data) if temp > 2000 and t == 0: t = 1#Open recording print "Detect the signal and start recording,Time five seconds." begin = time.time() print temp if t: print np.max(audio_data) if np.max(audio_data)<1000: sum += 1 print sum end = time.time() if end-begin>5: time_flag = 1 print "Five seconds, ready to end" # If the number is greater than COUNT_NUM, save at least SAVE_LENGTH blocks if large_sample_count > COUNT_NUM: save_count = SAVE_LENGTH else: save_count -= 1 if save_count < 0: save_count = 0 if save_count > 0: # Store the data to be saved in save_buffer save_buffer.append(string_audio_data ) else: # Write the data in save_buffer to the WAV file whose name is the time to save #if time_flag: if len(save_buffer) > 0 or time_flag: #filename = datetime.now().strftime("%Y-%m-%d_%H_%M_%S") + ".wav"#Originally, time was used as a name. filename = str(flag_num)+".wav" flag_num += 1 save_wave_file(filename, save_buffer) save_buffer = [] t = 0 sum =0 time_flag = 0 print filename, "Save Successfully Speech Recognition in Progress" use_cloud(token) print duihua info = duihua duihua = "" request = api + info response = getHtml(request) dic_json = json.loads(response) #print 'Robot: '.decode('utf-8') + dic_json['text']#The trouble here is character encoding. #huida = ' '.decode('utf-8') + dic_json['text'] a = dic_json['text'] print type(a) unicodestring = a # Converting Unicode into a normal Python string: "encode" utf8string = unicodestring.encode("utf-8") print type(utf8string) print str(a) url = "http://tsn.baidu.com/text2audio?tex="+dic_json['text']+"&lan=zh&per=0&pit=1&spd=7&cuid=7519663&ctp=1&tok=24.a5f341cf81c523356c2307b35603eee6.2592000.1464423912.282335-7519663" os.system('mpg123 "%s"'%(url))#Play with mpg123
7: Major bug resolution
In addition to the environmental factors, that is, Chinese encoding, there are object parsing. The source code from Baidu speech recognition returns a dictionary object, and the dictionary object is part of a direct string, some are arrays, first read out the string to determine whether it is succeeeeds. Then read tex. Array t. In Chinese.
Another bug is Chinese encoding.
import sys reload(sys) sys.setdefaultencoding( "utf-8" ) #Also #print'Robot:'. decode('utf-8') + dic_json['text'] #huida = ' '.decode('utf-8') + dic_json['text'] a = dic_json['text'] print type(a) unicodestring = a # Converting Unicode into a normal Python string: "encode" utf8string = unicodestring.encode("utf-8")
Then the main problem of transplanting to raspberry pie is that the aercode command appears that the file directory can not be found. So it means that you chose the wrong sound card, and the recording sound is too small. Use alsamixer to choose clearly.
There is also the problem of recording recognition efficiency, the main problem is that Baidu has his requirements, so we have to set 16 bits. Then listen to the recorded voice again to see if the volume is too large, whether there is a very rough voice. It's better to test separately.
8: Source code - Raspberry pie environment
Pyraudio is a mistake I don't want, so I still bypass it, use aercode to record commands, and then use Python to disable it. The code is much shorter, but it loses the ability to process sound in real time.
# -*- coding: utf-8 -*- from pyaudio import PyAudio, paInt16 import numpy as np from datetime import datetime import wave import time import urllib, urllib2, pycurl import base64 import json import os import sys reload(sys) sys.setdefaultencoding( "utf-8" ) save_count = 0 save_buffer = [] t = 0 sum = 0 time_flag = 0 flag_num = 0 filename = '2.wav' duihua = '1' def getHtml(url): page = urllib.urlopen(url) html = page.read() return html def get_token(): apiKey = "Ll0c53MSac6GBOtpg22ZSGAU" secretKey = "44c8af396038a24e34936227d4a19dc2" auth_url = "https://openapi.baidu.com/oauth/2.0/token?grant_type=client_credentials&client_id=" + apiKey + "&client_secret=" + secretKey; res = urllib2.urlopen(auth_url) json_data = res.read() return json.loads(json_data)['access_token'] def dump_res(buf): global duihua print "String type" print (buf) a = eval(buf) print type(a) if a['err_msg']=='success.': #print a['result'][0]#At last, we can output and return the statement here. duihua = a['result'][0] print duihua def use_cloud(token): fp = wave.open(filename, 'rb') nf = fp.getnframes() f_len = nf * 2 audio_data = fp.readframes(nf) cuid = "7519663" #product id srv_url = 'http://vop.baidu.com/server_api' + '?cuid=' + cuid + '&token=' + token http_header = [ 'Content-Type: audio/pcm; rate=8000', 'Content-Length: %d' % f_len ] c = pycurl.Curl() c.setopt(pycurl.URL, str(srv_url)) #curl doesn't support unicode #c.setopt(c.RETURNTRANSFER, 1) c.setopt(c.HTTPHEADER, http_header) #must be list, not dict c.setopt(c.POST, 1) c.setopt(c.CONNECTTIMEOUT, 30) c.setopt(c.TIMEOUT, 30) c.setopt(c.WRITEFUNCTION, dump_res) c.setopt(c.POSTFIELDS, audio_data) c.setopt(c.POSTFIELDSIZE, f_len) c.perform() #pycurl.perform() has no return val # Save the data in data to a WAV file called filename def save_wave_file(filename, data): wf = wave.open(filename, 'wb') wf.setnchannels(1) wf.setsampwidth(2) wf.setframerate(SAMPLING_RATE) wf.writeframes("".join(data)) wf.close() token = get_token() key = '05ba411481c8cfa61b91124ef7389767' api = 'http://www.tuling123.com/openapi/api?key=' + key + '&info=' while(True): os.system('arecord -D "plughw:1,0" -f S16_LE -d 5 -r 8000 /home/luyi/yuyinduihua/2.wav') use_cloud(token) print duihua info = duihua duihua = "" request = api + info response = getHtml(request) dic_json = json.loads(response) a = dic_json['text'] print type(a) unicodestring = a # Converting Unicode into a normal Python string: "encode" utf8string = unicodestring.encode("utf-8") print type(utf8string) print str(a) url = "http://tsn.baidu.com/text2audio?tex="+dic_json['text']+"&lan=zh&per=0&pit=1&spd=7&cuid=7519663&ctp=1&tok=24.a5f341cf81c523356c2307b35603eee6.2592000.1464423912.282335-7519663" os.system('mpg123 "%s"'%(url))