Dry Goods | Call AI api to realize text reading on Web pages

Keywords: Programming Java SDK JSON html5

Jingdong cloud provides enough ai apis, and they are encapsulated by http. Users can easily access the ai capability of Jingdong cloud in their own systems. Today, I'm going to introduce how to write a few codes and use Jingdong Yun's voice synthesis api to read the text aloud in the web page. The result is small delay, support mainstream devices, beautiful tone, and can switch between boys and girls.

Final effect

Ultimately, Wechat opens the link and clicks the play button to read the text aloud.

Introduction to Api

Jingdong Cloud AI API uses Restful interface style, and provides the sdk of java and python. sdk can easily encapsulate parameters and call api to get data.

In order to improve the response speed of the caller, the voice synthesis api uses the mode of piecewise synthesis, so the voice data is written back to the front end in the form of data stream when calling in the back-end logic.

Getting AK/SK

To visit Beijing East Cloud api, ak sk needs to be acquired and used in conjunction with sdk.
Enter Jingdong Cloud Console - Account Management - Access Key Management, create and obtain Access Key.

Back-end audio stream synthesis

This paper presents the source code of the back end, implements a controller and develops a get request method. The logic of parameter encapsulation all refines a separate method. The logic structure of the code is simple and easy to understand. The code uses fastJson to process parameters, and refers to Jingdong cloud sdk. The rest are jdk's own api, which relies very little.

  1 import com.alibaba.fastjson.JSON;
  2 import com.alibaba.fastjson.JSONObject;
  3 import com.wxapi.WxApiCall.WxApiCall;
  4 import com.wxapi.model.RequestModel;
  5
  6 import org.springframework.stereotype.Controller;
  7 import org.springframework.web.bind.annotation.GetMapping;
  8 import org.springframework.web.bind.annotation.RequestHeader;
  9
 10 import javax.servlet.http.HttpServletRequest;
 11 import javax.servlet.http.HttpServletResponse;
 12 import java.io.IOException;
 13 import java.io.OutputStream;
 14 import java.util.Base64;
 15 import java.util.HashMap;
 16 import java.util.Map;
 17
 18 @Controller
 19 public class TTSControllerExample {
 20    //url appkey secretkey
 21    private static final String url = "https://aiapi.jdcloud.com/jdai/tts";
 22    private static final String appKey = "";
 23    private static final String secretKey = "";
 24
 25    @GetMapping("/tts/stream/example")
 26    public void ttsStream(
 27            @RequestHeader(value = "Range", required = false) String range,
 28            HttpServletRequest req,
 29            HttpServletResponse resp) {
 30
 31        //Take header Range: bytes=0-1 with the first confirmation request of safari, and write back 1 byte data to prevent errors
 32        if ("bytes=0-1".equals(range)) {
 33            try {
 34                byte[] temp = new byte['a'];
 35                resp.setHeader("Content-Type", "audio/mp3");
 36                OutputStream out = resp.getOutputStream();
 37                out.write(temp);
 38 } catch (IOException e) {
 39                e.printStackTrace();
 40            }
 41            return;
 42        }
 43        //Encapsulation input parameters
 44        Map queryMap = processQueryParam(req);
 45        String text = req.getParameter("text");
 46 //Encapsulating api call request message
 47        RequestModel requestModel = getBaseRequestModel(queryMap, text);
 48        try {
 49 //Write back audio data to the front end
 50            writeTtsStream(resp, requestModel);
 51 } catch (IOException e) {
 52            e.printStackTrace();
 53        }
 54    }
 55
 56    /**
 57     * Encapsulate the front-end input parameters as request objects for api calls, and set url appkey secaretKey
 58     * @param queryMap
 59     * @param bodyStr
 60     * @return
 61     */
 62    private RequestModel getBaseRequestModel(Map queryMap, String bodyStr) {
 63        RequestModel requestModel = new RequestModel();
 64        requestModel.setGwUrl(url);
 65        requestModel.setAppkey(appKey);
 66        requestModel.setSecretKey(secretKey);
 67        requestModel.setQueryParams(queryMap);
 68        requestModel.setBodyStr(bodyStr);
 69        return requestModel;
 70    }
 71
 72    /**
 73     * Streaming api calls require sequenceId to be incremented sequenceId, which is used to set the request object sequenceId
 74     * @param sequenceId
 75     * @param requestModel
 76     * @return
 77     */
 78    private RequestModel changeSequenceId(int sequenceId, RequestModel requestModel) {
 79        requestModel.getQueryParams().put("Sequence-Id", sequenceId);
 80        return requestModel;
 81    }
 82
 83    /**
 84     * Encapsulate request parameters in request as queryMap in the request object called by api
 85     * @param req
 86     * @return
 87     */
 88    private Map processQueryParam(HttpServletRequest req) {
 89        String reqid = req.getParameter("reqid");
 90        int tim = Integer.parseInt(req.getParameter("tim"));
 91        String sp = req.getParameter("sp");
 92
 93        JSONObject parameters = new JSONObject(8);
 94        parameters.put("tim", tim);
 95        parameters.put("sr", 24000);
 96        parameters.put("sp", sp);
 97        parameters.put("vol", 2.0);
 98        parameters.put("tte", 0);
 99        parameters.put("aue", 3);
100
101        JSONObject property = new JSONObject(4);
102        property.put("platform", "Linux");
103        property.put("version", "1.0.0");
104        property.put("parameters", parameters);
105
106        Map<String, Object> queryMap = new HashMap<>();
107 //Access parameters
108        queryMap.put("Service-Type", "synthesis");
109        queryMap.put("Request-Id", reqid);
110        queryMap.put("Protocol", 1);
111        queryMap.put("Net-State", 1);
112        queryMap.put("Applicator", 1);
113        queryMap.put("Property", property.toJSONString());
114
115        return queryMap;
116    }
117
118    /**
119     * Loop call api to write back audio data to response object
120     * @param resp
121     * @param requestModel
122     * @throws IOException
123     */
124    public void writeTtsStream(HttpServletResponse resp, RequestModel requestModel) throws IOException {
125        //Segmented Audio Sequence Id Increases from 1
126        int sequenceId = 1;
127        changeSequenceId(sequenceId, requestModel);
128        //Set the content type of the return header to audio/mp3
129        resp.setHeader("Content-Type", "audio/mp3");
130        //api requests sdk objects
131        WxApiCall call = new WxApiCall();
132        //Get the output stream for output audio stream
133        OutputStream out = resp.getOutputStream();
134        call.setModel(requestModel);
135        //Parse the return message to get status
136        String response = call.request();
137        JSONObject jsonObject = JSON.parseObject(response);
138        JSONObject data = jsonObject.getJSONObject("result");
139        //For the first request, add a check, and write back 500 error codes to the front end if an error occurs
140        if (data.getIntValue("status") != 0) {
141            resp.sendError(500, data.getString("message"));
142            return;
143        }
144        //Push Actual Audio Data
145        String audio = data.getString("audio");
146        byte[] part = Base64.getDecoder().decode(audio);
147        out.write(part);
148        out.flush();
149        //Determine whether it is over, multiple requests correspond to multiple indexes, index < 0 represents the last package
150        if (data.getIntValue("index") < 0) {
151            return;
152        }
153        //Loop the rest of the audio
154        while (data.getIntValue("index") >= 0) {
155            //sequenceid increment
156            sequenceId = sequenceId + 1;
157            changeSequenceId(sequenceId, requestModel);
158            //Request api for new audio data
159            call.setModel(requestModel);
160            response = call.request();
161            jsonObject = JSON.parseObject(response);
162            data = jsonObject.getJSONObject("result");
163            audio = data.getString("audio");
164            part = Base64.getDecoder().decode(audio);
165            //Write back new audio data
166            out.write(part);
167            out.flush();
168        }
169    }
170
171
172
173 Front end audio Play aloud
174 The front end portion is given in the vue In Modular Development script In part, due to the adoption of html5 Of audio For voice playback, reference is required for compatibility howler.js (npm install howler)，The main logic is to stitch a text according to the set parameters and to be read aloud. url，call howler.js In api Play.
175
176 <script>
177 import {Howl, Howler} from 'howler'
178 export default {
179  data() {
180    return {
181      news: { //News content
182        ......
183      },
184      role: 1, //0 female voice, 1 male voice
185      speed: 1, //Play speed
186      curIndex: -1, //The order of played paragraphs in all paragraphs is related to user interactive display, not to streaming playback.
187      sound: null, //The only variable on the page that points to the howler instance
188      status: 'empty' //load,pause,stop,empty are only related to user interactive display, but not to streaming playback display
189    }
190  },
191  methods: {
192    generateUUID () { //Generating uuid
193      let d = Date.now()
194      return 'xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx'.replace(/[xy]/g, c => {
195        let r = (d + Math.random() * 16) % 16 | 0
196        d = Math.floor(d / 16)
197        return (c === 'x' ? r : (r & 0x3) | 0x8).toString(16)
198      })
199    },
200    audioSrc (txt) { //Generate links to get audio
201      let content = encodeURI(txt) //Word coding
202      return `http://neuhubdemo.jd.com/api/tts/streamv2?reqid=${
203          this.generateUUID() // requestID
204        }&text=${
205          content //Encoded text content
206        }&tim=${
207          this.role //Men's voice or women's voice
208        }&sp=${
209          this.speed //Play speed
210        }`
211    },
212    /** 
213     * Getting Streaming Audio Corresponding to Text
214     * 
215     * Using howler can solve the compatibility problem of some mobile browsers (eg:UC).
216     * But to solve the compatibility problem of Wechat and safari on ios,
217     * The backend is required to control the request through the header field {range: bytes=0-1}
218     *  @param {String Text to be transferred to audio} txt
219    */
220    howlerPlay(txt) { 
221      if (this.sound) {
222        this.sound.unload() //If sound has a value, the original object is destroyed
223      }
224      let self = this
225      this.status = 'load'
226      this.sound = new Howl({
227        src: `${this.audioSrc(txt)}`,
228        html5: true, //Must! A live stream can only be played through HTML5 Audio.
229        format: ['mp3', 'aac'],
230        //The following onplay, onpause and onend are all related to control display
231        onplay() {
232          self.status = 'pause'
233        },
234        onpause: function() {
235          self.status = 'stop'
236        },
237        onend: function() {
238          self.status = 'stop'
239        }
240      });
241      this.sound.play()
242    },
243    //Controlling user interaction
244    play (txt, index) {
245      if (this.curIndex === index) {
246        if (this.status === 'stop') {
247          this.sound.play()
248        } else {
249          this.sound.pause()
250        }
251      } else {
252        this.curIndex = index
253        this.howlerPlay(txt)
254      }
255    }
256  }
257 }
258 </script>

Are you eager to try after reading this operation document? Want to know more about AI?

This Saturday, we have prepared "Jingdong Cloud Technology Salon AI Specialty" for you, from "Intelligent Retail" to "Unmanned Storage" to reveal the practice and application of Jingdong Artificial Intelligence Technology. There will be technical experts to answer your questions.

Click“ Read the original text ” You can sign up for free!

Posted by darksniperx on Fri, 19 Jul 2019 03:56:06 -0700

Programmer Group

Dry Goods | Call AI api to realize text reading on Web pages

Final effect

Introduction to Api

Getting AK/SK

Back-end audio stream synthesis

Hot Keywords