Android Media Codec Hard Decodes AAC Audio Files (Real-time AAC Audio Frames) and Plays

Today, I will briefly introduce how to use Android Media Codec to decode AAC audio files or real-time AAC audio frames and play them through AudioTrack. The main idea is to get the data of a frame of AAC from the file or network, and send it to the decoder for decoding and playing.

Packaging AudioTrack

AudioTrack is mainly used to play sound, but only PCM format audio stream can be played. This is mainly a simple encapsulation of AudioTrack, with some exception judgments added:

 * Created by ZhangHao on 2017/5/10.
 * Play pcm data
public class MyAudioTrack {
    private int mFrequency;// sampling rate
    private int mChannel;// Vocal tract
    private int mSampBit;// Sampling accuracy
    private AudioTrack mAudioTrack;

    public MyAudioTrack(int frequency, int channel, int sampbit) {
        this.mFrequency = frequency;
        this.mChannel = channel;
        this.mSampBit = sampbit;

     * Initialization
    public void init() {
        if (mAudioTrack != null) {
        // Get the minimum buffer size of the constructed object
        int minBufSize = getMinBufferSize();
        mAudioTrack = new AudioTrack(AudioManager.STREAM_MUSIC,
                mFrequency, mChannel, mSampBit, minBufSize, AudioTrack.MODE_STREAM);;

     * Releasing resources
    public void release() {
        if (mAudioTrack != null) {

     * Write the decoded pcm data to audioTrack for playback
     * @param data   data
     * @param offset deviation
     * @param length Length to play
    public void playAudioTrack(byte[] data, int offset, int length) {
        if (data == null || data.length == 0) {
        try {
            mAudioTrack.write(data, offset, length);
        } catch (Exception e) {
            Log.e("MyAudioTrack", "AudioTrack Exception : " + e.toString());

    public int getMinBufferSize() {
        return AudioTrack.getMinBufferSize(mFrequency,
                mChannel, mSampBit);

Here we briefly introduce the meanings of several variables in AudioTrack (int streamType, int sampleRateInHz, int channel Config, int audioFormat, int bufferSizeInBytes, int mode):
1.streamType: Specifies the type of flow, mainly including the following:
- STREAM_MUSCI: Musical Sound
- STREAM_SYSTEM: System Sound
- STREAM_VOCIE_CALL: Telephone Voice
Because the android system manages different sounds separately, the function of this parameter is to set the type of sound played by AudioTrack.

2.sampleRateInHz: sampling rate

3. Channel Config: Vocal Track

4.audioFormat: Sampling Accuracy

5.bufferSizeInBytes: Buffer size can be obtained through AudioTrack.getMinBufferSize(int sampleRateInHz, int channelConfig, int audioFormat)

- MODE_STATIC: Loading all data directly into the buffer without multiple write s is generally used in situations of small memory footprint and high latency requirements.
- MODE_STREAM: Multiple write s are required, usually for situations such as data acquisition from the network or real-time decoding. This is the case in this example.

Here is just a brief introduction, you can go online to find more detailed introduction.

AAC decoder

MediaCodec is encapsulated to decode AAC in one frame.

 * Created by ZhangHao on 2017/5/17.
 * Audio decoding for aac

public class AACDecoderUtil {
    private static final String TAG = "AACDecoderUtil";
    private static final int KEY_CHANNEL_COUNT = 2;
    //sampling rate
    private static final int KEY_SAMPLE_RATE = 48000;
    //pcm for playback and decoding
    private MyAudioTrack mPlayer;
    private MediaCodec mDecoder;
    //Number of frames used to record decoding failures
    private int count = 0;

     * Initialize all variables
    public void start() {

     * Initial decoder
     * @return Initialization failure returns false and success returns true
    public boolean prepare() {
        // Initialize AudioTrack
        mPlayer = new MyAudioTrack(KEY_SAMPLE_RATE, AudioFormat.CHANNEL_OUT_STEREO, AudioFormat.ENCODING_PCM_16BIT);
        try {
            //Types of data to be decoded
            String mine = "audio/mp4a-latm";
            //Initial decoder
            mDecoder = MediaCodec.createDecoderByType(mine);
            //MediaFormat is used to describe the relevant parameters of audio and video data
            MediaFormat mediaFormat = new MediaFormat();
            //data type
            mediaFormat.setString(MediaFormat.KEY_MIME, mine);
            //Number of vocal tracts
            mediaFormat.setInteger(MediaFormat.KEY_CHANNEL_COUNT, KEY_CHANNEL_COUNT);
            //sampling rate
            mediaFormat.setInteger(MediaFormat.KEY_SAMPLE_RATE, KEY_SAMPLE_RATE);
            //bit rate
            mediaFormat.setInteger(MediaFormat.KEY_BIT_RATE, 128000);
            //Used to mark whether AAC has an adts header, 1 - > Yes.
            mediaFormat.setInteger(MediaFormat.KEY_IS_ADTS, 1);
            //Types used to mark aac
            mediaFormat.setInteger(MediaFormat.KEY_AAC_PROFILE, MediaCodecInfo.CodecProfileLevel.AACObjectLC);
            //ByteBuffer key
            byte[] data = new byte[]{(byte) 0x11, (byte) 0x90};
            ByteBuffer csd_0 = ByteBuffer.wrap(data);
            mediaFormat.setByteBuffer("csd-0", csd_0);
            //Decoder configuration
            mDecoder.configure(mediaFormat, null, null, 0);
        } catch (IOException e) {
            return false;
        if (mDecoder == null) {
            return false;
        return true;

     * aac Decode + Play
    public void decode(byte[] buf, int offset, int length) {
        //Enter ByteBuffer
        ByteBuffer[] codecInputBuffers = mDecoder.getInputBuffers();
        //Output ByteBuffer
        ByteBuffer[] codecOutputBuffers = mDecoder.getOutputBuffers();
        //Waiting time, 0 - > No waiting, - 1 - > Waiting all the time
        long kTimeOutUs = 0;
        try {
            //Returns an index of an input buffer containing valid data, - 1 - > does not exist
            int inputBufIndex = mDecoder.dequeueInputBuffer(kTimeOutUs);
            if (inputBufIndex >= 0) {
                //Get the current ByteBuffer
                ByteBuffer dstBuf = codecInputBuffers[inputBufIndex];
                //Empty ByteBuffer
                //Fill in data
                dstBuf.put(buf, offset, length);
                //Submit the input buffer of the specified index to the decoder
                mDecoder.queueInputBuffer(inputBufIndex, 0, length, 0, 0);
            //Codec Buffer
            MediaCodec.BufferInfo info = new MediaCodec.BufferInfo();
            //Returns an output buffer index, - 1 - > does not exist
            int outputBufferIndex = mDecoder.dequeueOutputBuffer(info, kTimeOutUs);

            if (outputBufferIndex < 0) {
                //Record the number of decoding failures
            ByteBuffer outputBuffer;
            while (outputBufferIndex >= 0) {
                //Get the decoded ByteBuffer
                outputBuffer = codecOutputBuffers[outputBufferIndex];
                //Used to save decoded data
                byte[] outData = new byte[info.size];
                //wipe cache
                //Play decoded data
                mPlayer.playAudioTrack(outData, 0, info.size);
                //Release decoded buffer
                mDecoder.releaseOutputBuffer(outputBufferIndex, false);
                //Decode incomplete data
                outputBufferIndex = mDecoder.dequeueOutputBuffer(info, kTimeOutUs);
        } catch (Exception e) {
            Log.e(TAG, e.toString());

    //Returns the number of decoding failures
    public int getCount() {
        return count;

     * Releasing resources
    public void stop() {
        try {
            if (mPlayer != null) {
                mPlayer = null;
            if (mDecoder != null) {
        } catch (Exception e) {

In fact, this is very similar to my previous use of MediaCodec to decode H264, mainly because the decoding data type is different, so the initialization is different. Another point is that when decoding H624, the decoded data is displayed directly by surface, while decoding aac is to take the decoded data out and play it by AudioTrack.

Read aac files

Here, the thread reads the aac file, obtains the aac frame data of a frame, and then sends it to the decoder to play.

 * Created by ZhangHao on 2017/4/18.
 * Play aac audio files
public class ReadAACFileThread extends Thread {

    private AACDecoderUtil audioUtil;
    //File path
    private String filePath;
    //File Read Completion Identification
    private boolean isFinish = false;
    //This value is used to find the first frame header, then continue to look for the second frame header. If decoding fails, you can try to reduce this value.
    private int FRAME_MIN_LEN = 50;
    //Generally, AAC frame size does not exceed 200k. If decoding fails, you can try to increase this value.
    private static int FRAME_MAX_LEN = 100 * 1024;
    //According to the frame rate, the decoding time of each frame needs to be dormant, and the operation is carried out according to the actual frame rate.
    private int PRE_FRAME_TIME = 1000 / 50;
    //Record the number of frames captured
    private int count = 0;

    public ReadAACFileThread(String path) {
        this.audioUtil = new AACDecoderUtil();
        this.filePath = path;

    public void run() {;
        File file = new File(filePath);
        //Determine whether a file exists
        if (file.exists()) {
            try {
                FileInputStream fis = new FileInputStream(file);
                //Save the complete data frame
                byte[] frame = new byte[FRAME_MAX_LEN];
                //Current frame length
                int frameLen = 0;
                //Data read from a file at a time
                byte[] readData = new byte[10 * 1024];
                //start time
                long startTime = System.currentTimeMillis();
                //Loop read data
                while (!isFinish) {
                    if (fis.available() > 0) {
                        int readLen =;
                        //The current length is less than the maximum
                        if (frameLen + readLen < FRAME_MAX_LEN) {
                            //Copy readData to frame
                            System.arraycopy(readData, 0, frame, frameLen, readLen);
                            //Modify frameLen
                            frameLen += readLen;
                            //Find the first frame header
                            int headFirstIndex = findHead(frame, 0, frameLen);
                            while (headFirstIndex >= 0 && isHead(frame, headFirstIndex)) {
                                //Find the second frame header
                                int headSecondIndex = findHead(frame, headFirstIndex + FRAME_MIN_LEN, frameLen);
                                //If the second header exists, then between the two headers is a complete frame of data.
                                if (headSecondIndex > 0 && isHead(frame, headSecondIndex)) {
                                    //Video decoding
                                    Log.e("ReadAACFileThread", "Length : " + (headSecondIndex - headFirstIndex));
                                    audioUtil.decode(frame, headFirstIndex, headSecondIndex - headFirstIndex);
                                    //Intercept the valid data to the frame after headSecond Index and put it at the front of the frame
                                    byte[] temp = Arrays.copyOfRange(frame, headSecondIndex, frameLen);
                                    System.arraycopy(temp, 0, frame, 0, temp.length);
                                    //Modify the value of frameLen
                                    frameLen = temp.length;
                                    //Thread dormancy
                                    sleepThread(startTime, System.currentTimeMillis());
                                    //Reset start time
                                    startTime = System.currentTimeMillis();
                                    //Continue looking for data frames
                                    headFirstIndex = findHead(frame, 0, frameLen);
                                } else {
                                    //No second frame header was found
                                    headFirstIndex = -1;
                        } else {
                            //If the length exceeds the maximum, frame Len is set to 0
                            frameLen = 0;
                    } else {
                        //End of File Reading
                        isFinish = true;
            } catch (Exception e) {
            Log.e("ReadAACFileThread", "AllCount:" + count + "Error Count : " + audioUtil.getCount());
        } else {
            Log.e("ReadH264FileThread", "File not found");

     * Find the starting position of AAC header in the specified buffer
     * @param startIndex Starting position
     * @param data       data
     * @param max        Maximum value to be detected
     * @return
    private int findHead(byte[] data, int startIndex, int max) {
        int i;
        for (i = startIndex; i <= max; i++) {
            //Discovery of frame headers
            if (isHead(data, i))
        //Maximum detected, no header found
        if (i == max) {
            i = -1;
        return i;

     * Judging aac frame header
    private boolean isHead(byte[] data, int offset) {
        boolean result = false;
        if (data[offset] == (byte) 0xFF && data[offset + 1] == (byte) 0xF1
                && data[offset + 3] == (byte) 0x80) {
            result = true;
        return result;

    //Sleep repair
    private void sleepThread(long startTime, long endTime) {
        //According to the time spent in reading and decoding files, the time needed to sleep is calculated.
        long time = PRE_FRAME_TIME - (endTime - startTime);
        if (time > 0) {
            try {
            } catch (InterruptedException e) {

There is not much here, that is, to judge the aac frame by the frame head, and intercept each frame of data into the decoder. I just made a simple judgment here by coincidence. The judgment of frame head does not necessarily satisfy all aac frame heads. We can modify it according to the actual situation.


In fact, to separate audio frames, we can use the class MediaExtractor, but because my actual data source is from the network, demo will be a little more complex.

