Mobile wechat voice batch to text using Baidu voice recognition

Batch recognition of Baidu AI Cloud voice with WeChat

Catalog

Batch recognition of Baidu AI Cloud voice with WeChat

If you don't see this article in the cnblog author carr0t2, it is recommended to visit the original webpage for better typesetting and picture experience
I'm not sure if I want to continue to develop graphical interface, more support, if you really need, please leave a message below.

Prepare tools and environment

Python3.7
silk-v3-decoder https://github.com/kn007/silk-v3-decoder
Baidu AI Cloud account (with Baidu account), apply for API Key and Secret Key.
Baidu short speech recognition API Demo modified based on official Demo code https://github.com/Baidu-AIP/speech-demo/tree/master/rest-api-asr/python
The environment of this article is windows Python 3.7

General idea

Mobile wechat finds the location where voice files are saved and exports them
Using silk-v3-decoder to convert recording to wav format
ffmpeg transforms wav into pcm with sampling frequency of 16000
Identifying with python
Personal handling only, please point out if there is any problem

Specific operation

Export wechat voice file

Wechat voice files of mobile phones are generally saved in internal storage \ Tencent \ micromsg \ ****************************** \ voice2

Inside the asterisk is a long string of alphanumeric characters

It includes many of these folders
Copy, paste, and extract all audio files

Search under Windows.amr

Copy and paste all to a new folder
These are the recording files, but the format is strange and needs to be processed into the normal format

Processing exported voice files

rename file

Because it is necessary to keep the relative order, and the direct conversion will lead to the change of file modification time, so the normal voice order cannot be restored
With python, extract the file modification time and rename it

import os
import time

path='.\\lecture'
dirs = os.listdir(path)
for file in dirs:
    finfo = os.stat(path+'\\'+file)
    timeArray = time.localtime(finfo.st_mtime)
    nametime = time.strftime("%Y_%m_%d_%H_%M_%S", timeArray)
    os.rename(path+'\\'+file,path+'\\'+nametime+'.amr')
    print(nametime)

Convert to pcm format

python calls silk · V3 · decoder.exe from the command line to decode. The specific command is written as follows
It seems that pcm files cannot be played directly. Audacity is OK

Modify Demo code

copy code first

https://github.com/Baidu-AIP/speech-demo/tree/master/rest-api-asr/python

I don't think json is different from raw in small scale
Fill in API Key and Secret Key
Write python, I only changed a little bit, and pasted all the code in github

Previous changes

The format of silk ﹣ V3 ﹣ decoder.exe is converted to 16k pcm

    FORMAT = 'pcm'
    pathamr=r'.\amr'
    pathpcm=r'.\pcm'
    dirs = os.listdir(pathamr)
    #dirs.remove('desktop.ini')### Windows may have this file
    for file in dirs:
        time.sleep(0.3)
        name=file[:-3]
        commandstring= ' silk_v3_decoder.exe ' + str(pathamr) + '\\' + name + 'amr ' + str(pathpcm) +'\\'+ str(name) + 'pcm' +' -Fs_API 16000 '
        os.system(commandstring)
        AUDIO_FILE =str(pathpcm)+'\\'+ str(name) + 'pcm'

Later changes

Make the output append, and add the time field. Subsequent processing has not been done, so the exported file is still json

	    with open("result.txt","a") as of:
            result_dict=eval(result_str)
            result_dict["time"]=name
            of.write(str(result_dict)+'\n')

Epilogue

Just learned python, write casually, welcome to point out the mistakes
The follow-up processing of the file has not been done well. The output is the time in the front line and the identification content in the back line. If there is a large identification deviation, it is convenient to find the location and listen again
Without full automation, we still need to handle the content manually.
Baidu's voice self training platform is not used

Code (for reference only)

import sys
import json
import base64
import time
import os
import subprocess

IS_PY3 = sys.version_info.major == 3

if IS_PY3:
    from urllib.request import urlopen
    from urllib.request import Request
    from urllib.error import URLError
    from urllib.parse import urlencode
    timer = time.perf_counter
else:
    from urllib2 import urlopen
    from urllib2 import Request
    from urllib2 import URLError
    from urllib import urlencode
    if sys.platform == "win32":
        timer = time.clock
    else:
        # On most other platforms the best timer is time.time()
        timer = time.time

API_KEY = '****************'### Fill in your own
SECRET_KEY = '*****************'

# Documents to be identified
# file format
FORMAT = 'pcm'  # The file suffix only supports pcm/wav/amr format, and the speed version additionally supports m4a format
###In order to facilitate the direct restriction of death
CUID = '****************'
# sampling rate
RATE = 16000  # Fixed value

DEV_PID = 1537  # 1537 means to recognize Putonghua, using input method model. Fill in PID according to documents, select language and recognition model
ASR_URL = 'http://vop.baidu.com/server_api'
SCOPE = 'audio_voice_assistant_get'  # If you have this scope, it means that you have asr capability. If not, please check it in the web page. Very old applications may not have


class DemoError(Exception):
    pass


"""  TOKEN start """

TOKEN_URL = 'http://openapi.baidu.com/oauth/2.0/token'

def fetch_token():
    params = {'grant_type': 'client_credentials',
              'client_id': API_KEY,
              'client_secret': SECRET_KEY}
    post_data = urlencode(params)
    if (IS_PY3):
        post_data = post_data.encode( 'utf-8')
    req = Request(TOKEN_URL, post_data)
    try:
        f = urlopen(req)
        result_str = f.read()
    except URLError as err:
        print('token http response http code : ' + str(err.code))
        result_str = err.read()
    if (IS_PY3):
        result_str =  result_str.decode()

    print(result_str)
    result = json.loads(result_str)
    print(result)
    if ('access_token' in result.keys() and 'scope' in result.keys()):
        print(SCOPE)
        if SCOPE and (not SCOPE in result['scope'].split(' ')):  # SCOPE = False ignore check
            raise DemoError('scope is not correct')
        print('SUCCESS WITH TOKEN: %s  EXPIRES IN SECONDS: %s' % (result['access_token'], result['expires_in']))
        return result['access_token']
    else:
        raise DemoError('MAYBE API_KEY or SECRET_KEY not correct: access_token or scope not found in token response')

"""  TOKEN end """

if __name__ == '__main__':
    token = fetch_token()

    pathamr=r'.\amr'
    pathpcm=r'.\pcm'
    dirs = os.listdir(pathamr)
    #dirs.remove('desktop.ini')### Windows may have this file
    for file in dirs:
        time.sleep(0.2)
        name=file[:-3]
        commandstring= ' silk_v3_decoder.exe ' + str(pathamr) + '\\' + name + 'amr ' + str(pathpcm) +'\\'+ str(name) + 'pcm' +' -Fs_API 16000 '
        os.system(commandstring)
        ######There's not much movement down here
        AUDIO_FILE =str(pathpcm)+'\\'+ str(name) + 'pcm'
        speech_data = []
        with open(AUDIO_FILE, 'rb') as speech_file:
            speech_data = speech_file.read()

        length = len(speech_data)
        if length == 0:
            raise DemoError('file %s length read 0 bytes' % AUDIO_FILE)
        speech = base64.b64encode(speech_data)
        if (IS_PY3):
            speech = str(speech, 'utf-8')
        params = {'dev_pid': DEV_PID,
                 #"lm_id" : LM_ID,    #Test this item from the training platform
                  'format': FORMAT,
                  'rate': RATE,
                  'token': token,
                  'cuid': CUID,
                  'channel': 1,
                  'speech': speech,
                  'len': length
                  }
        post_data = json.dumps(params, sort_keys=False)
        # print post_data
        req = Request(ASR_URL, post_data.encode('utf-8'))
        req.add_header('Content-Type', 'application/json')
        try:
            begin = timer()
            f = urlopen(req)
            result_str = f.read()
            print ("Request time cost %f" % (timer() - begin))
        except URLError as err:
            print('asr http response http code : ' + str(err.code))
            result_str = err.read()

        if (IS_PY3):
            result_str = str(result_str, 'utf-8')
        print(result_str)
        with open("result.txt","a") as of:
            result_dict=eval(result_str)
            #result_dict["time"]=name
            #of.write(str(result_dict)+'\n')
            of.write('{'+name+'}'+'\n')
            try:
                of.write(str(result_dict["result"])[2:-2]+'\n\n')
            except:
                of.write('Error'+'\n')

Posted by mjh513 on Sat, 04 Apr 2020 09:25:29 -0700

Programmer Group