Super simple integrated ML kit for dictation

Keywords: Mobile Android Maven SDK Google

background

I believe that all of us have dictation when we first learn a language. Now when primary school students learn Chinese, an important homework is dictation of new words in the text. Many parents have this experience. However, on the one hand, the action of reading words is relatively simple, on the other hand, parents' time is also very valuable. Now there are many pronunciation dictation after class in the market. These announcers record the dictation words in Chinese textbooks and download them for parents to use. However, the recording is not flexible enough. If the teacher leaves some extra words that are not in the after-school exercises today, this The recording can't meet the needs of parents and children. This paper introduces an APP that uses the common text recognition function and speech synthesis function of our ML kit to realize automatic voice broadcast. It only needs to take pictures of the dictation words or texts, and then it can automatically broadcast the words in the photos, and the broadcast tone and tone can be adjusted.

Preparation before development

Open Android studio project level build.gradle file

configure the maven address of HMS SDK in allprojects > repositories

allprojects { 
    repositories { 
        google() 
        jcenter() 
        maven {url 'http://developer.huawei.com/repo/'} 
    } 
 }

configure the maven address of HMS SDK in buildscript > repositories

buildscript { 
    repositories { 
        google() 
        jcenter() 
        maven {url 'http://developer.huawei.com/repo/'} 
    } 
 }

configure the maven address of HMS SDK in buildscript > repositories

buildscript { 
    repositories { 
        google() 
        jcenter() 
        maven {url 'http://developer.huawei.com/repo/'} 
    } 
 }

in buildscript > dependencies, configure AGC plug-in

dependencies {
    classpath 'com.huawei.agconnect:agcp:1.2.1.301'
}

Add compilation dependency

open application level build.gradle file

integrated SDK

dependencies{  
    implementation 'com.huawei.hms:ml-computer-voice-tts:1.0.4.300'
    implementation 'com.huawei.hms:ml-computer-vision-ocr:1.0.4.300'
    implementation 'com.huawei.hms:ml-computer-vision-ocr-cn-model:1.0.4.300'
}

apply ACG plug-in and add it to the file header

apply plugin: 'com.huawei.agconnect'

specify permissions and attributes: in AndroidManifest.xml Statement in the

<uses-permission android:name="android.permission.CAMERA" />
<uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE" />
<uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE" />
<uses-feature android:name="android.hardware.camera" />
<uses-feature android:name="android.hardware.camera.autofocus" />

Key steps of reading code

there are two main functions: one is to identify the text of the assignment, the other is to read the assignment. The assignment can be read aloud through OCR+TTS. After taking pictures, click play to read aloud.

Dynamic permission application

private static final int PERMISSION_REQUESTS = 1;
@Override 
public void onCreate(Bundle savedInstanceState) { 
    // Checking camera permission   
   if (!allPermissionsGranted()) {
       getRuntimePermissions();
   }
}

Start reading interface

public void takePhoto(View view) {
        Intent intent = new Intent(MainActivity.this, ReadPhotoActivity.class);
        startActivity(intent);
}

Call createLocalTextAnalyzer() in onCreate() method to create end side text recognizer.

private void createLocalTextAnalyzer() {
        MLLocalTextSetting setting = new MLLocalTextSetting.Factory()
                .setOCRMode(MLLocalTextSetting.OCR_DETECT_MODE)
                .setLanguage("zh")
                .create();
        this.textAnalyzer = MLAnalyzerFactory.getInstance().getLocalTextAnalyzer(setting);
 
}

In the onCreate() method, createTtsEngine () is called to create the speech synthesis engine, and the speech synthesis callback is used to process the speech synthesis results, and the speech synthesis callback is introduced into the new speech synthesis engine.

private void createTtsEngine() {
        MLTtsConfig mlConfigs = new MLTtsConfig()
                .setLanguage(MLTtsConstants.TTS_ZH_HANS)
                .setPerson(MLTtsConstants.TTS_SPEAKER_FEMALE_ZH)
                .setSpeed(0.2f)
                .setVolume(1.0f);
        this.mlTtsEngine = new MLTtsEngine(mlConfigs);
        MLTtsCallback callback = new MLTtsCallback() {
            @Override
            public void onError(String taskId, MLTtsError err) {
            }
 
            @Override
            public void onWarn(String taskId, MLTtsWarn warn) {
            }
 
            @Override
            public void onRangeStart(String taskId, int start, int end) {
            }
 
            @Override
            public void onEvent(String taskId, int eventName, Bundle bundle) {
                if (eventName == MLTtsConstants.EVENT_PLAY_STOP) {
                    if (!bundle.getBoolean(MLTtsConstants.EVENT_PLAY_STOP_INTERRUPTED)) {
                        Toast.makeText(ReadPhotoActivity.this.getApplicationContext(), R.string.read_finish, Toast.LENGTH_SHORT).show();
                    }
                }
            }
        };
        mlTtsEngine.setTtsCallback(callback);
}

Set the read photos, take pictures and read aloud buttons

this.relativeLayoutLoadPhoto.setOnClickListener(new View.OnClickListener() {
            @Override
            public void onClick(View v) {
ReadPhotoActivity.this.selectLocalImage(ReadPhotoActivity.this.REQUEST_CHOOSE_ORIGINPIC);
            }
        });
this.relativeLayoutTakePhoto.setOnClickListener(new View.OnClickListener() {
            @Override
            public void onClick(View v) {
ReadPhotoActivity.this.takePhoto(ReadPhotoActivity.this.REQUEST_TAKE_PHOTO);
            }
        });

Start text recognition in the callback of taking pictures and reading photos. startTextAnalyzer()

private void startTextAnalyzer() {
        if (this.isChosen(this.originBitmap)) {
            MLFrame mlFrame = new MLFrame.Creator().setBitmap(this.originBitmap).create();
            Task<MLText> task = this.textAnalyzer.asyncAnalyseFrame(mlFrame);
            task.addOnSuccessListener(new OnSuccessListener<MLText>() {
                @Override
                public void onSuccess(MLText mlText) {
                    // Transacting logic for segment success.
                    if (mlText != null) {
                        ReadPhotoActivity.this.remoteDetectSuccess(mlText);
                    } else {
                        ReadPhotoActivity.this.displayFailure();
                    }
                }
            }).addOnFailureListener(new OnFailureListener() {
                @Override
                public void onFailure(Exception e) {
                    // Transacting logic for segment failure.
                    ReadPhotoActivity.this.displayFailure();
                    return;
                }
            });
        } else {
            Toast.makeText(this.getApplicationContext(), R.string.please_select_picture, Toast.LENGTH_SHORT).show();
            return;
        }
}

After the recognition is successful, click the play button to start playing

this.relativeLayoutRead.setOnClickListener(new View.OnClickListener() {
            @Override
            public void onClick(View v) {
                if (ReadPhotoActivity.this.sourceText == null) {
                    Toast.makeText(ReadPhotoActivity.this.getApplicationContext(), R.string.please_select_picture, Toast.LENGTH_SHORT).show();
                } else {
                    ReadPhotoActivity.this.mlTtsEngine.speak(sourceText, MLTtsEngine.QUEUE_APPEND);
                    Toast.makeText(ReadPhotoActivity.this.getApplicationContext(), R.string.read_start, Toast.LENGTH_SHORT).show();
                }
            }
        });

Demo effect

Previous links: Ultra simple integration of HMS Scan Kit code scanning SDK, easy to realize scanning code purchase

Link to the original text: https://developer.huawei.com/consumer/cn/forum/topicview?tid=0201283755975150303&fid=18

Original author: littlewhite

Posted by abch624 on Mon, 29 Jun 2020 21:02:34 -0700

Programmer Group