AAC ADTS media stream format analysis and FFmpeg parsing mp4 aac code stream method

Keywords: ffmpeg aac

Like other coding formats, AAC is only a data coding format. The code stream organization formats include ADIF(Audio Data Interchange Format) and ADTS (Audio Data Transport Stream).
The significant difference between ADIF and ADTS is that the coding information of the former has a fixed place, and the coding information of the latter is contained in every packet. Therefore, ADIF is mainly used to store files on disk, and ADTS is mainly used for network flow of progressive transmission. This paper mainly analyzes ADTS flow.

ADTS stream format

ADTS stream format is ADTS header plus AAC raw data.
[ADTS Header](AAC ES data) | [ADTS Header](AAC ES data) | ...

ADTS Header is a fixed length of 7 bytes. The format can be represented by the following letter sequence. One letter represents a field, and the number represents the bit length.
AAAAAAAA AAAABCCD EEFFFFGH HHIJKLMM MMMMMMMM MMMOOOOO OOOOOOPP (QQQQQQQQ QQQQQQQQ)
A-J is called ADTS fixed head and K-Q is called variable head.

signBit lengthdescribe
A12Sync field, all 1
B1MPEG version: 0 for MPEG-4, 1 for MPEG-2, mp4 is 0
C2Layer: all 0
D1Protection missing ID protectionabsent, 1 indicates no CRC, 0 indicates CRC
E2AAC encoding level, MPEG-4 Audio Object Type minus 1. 0: Main Profile, 1:LC, 2: SSR, 3: reserved. Low complexity coding LC is commonly used.
F4MPEG-4 sampling rate table serial number. Note that this is the serial number, not the sampling rate value. Refer to the sampling rate table.
G1Private bit, set to 0, ignored during decoding
H3Number of channels, value range 1-7.
I1Source ID, code set to 0, decoding ignored
J1home, the encoding is set to 0, and decoding is ignored
K1Copyright flag bit, encoding set to 0, decoding ignored
L1The start bit of the copyright flag, the encoding is set to 0, and the decoding is ignored
M13Frame length: AAC raw data length + ADTS header length (protectionabsent = = 1? 7: 9)
O11Buffer fullness, 0x7FF indicates that it is a code stream with variable code rate
P2The number of AAC frames minus 1. When there are 1 frames, this value is 0
Q16If the protection missing ID ProtectionAbsent is 0, the ID has a 2-byte CRC check field

The original drawing of the official document is


The sampling rate table is shown below.

Method of parsing and saving aac code stream in mp4 by FFmpeg

FFmpeg uses av_read_frame(AVFormatContext *s, AVPacket *pkt); The avpacket data read by the function from the mp4 file is AAC raw data. If it is directly saved as a file, it cannot be played because there is no sampling rate and other information. According to the above description of ADTS Header structure, each frame is filled with one Header information.

The following is the ADTS header definition I use for reference. Note that the bitwise definitions in a single byte are in reverse order and may not be easy to understand.
ADTSHeader.h

#ifndef _ADTSHEADER_H
#define _ADTSHEADER_H

#include <string.h>

// 7 bytes
struct ADTS_Header {
    // fixed header
    // 1 byte
    uint8_t sync_word_l : 8;  // 0xFF
    // 2 byte
    // sync_word_h + id + layer + protection_absent
    uint8_t protection_absent : 1;  // 1 no CRC, 0 has CRC
    uint8_t layer : 2;              // 00
    uint8_t id : 1;                 // 0: MPEG-4, 1: MPEG-2
    uint8_t sync_word_h : 4;        // 0xF

    // 3rd byte
    // profile + sampling_frequency_index + private_bit + channel_configuration_l
    uint8_t channel_configuration_l : 1;
    uint8_t private_bit : 1;
    uint8_t sampling_frequency_index : 4;
    uint8_t profile : 2;  // 0:main, 1: LC, 2: SSR, 3: reserved

    // 4th byte
    uint8_t aac_frame_length_l : 2;
    uint8_t copyright_identification_start : 1;
    uint8_t copyright_identification_bit : 1;
    uint8_t home : 1;
    uint8_t original_copy : 1;
    uint8_t channel_configuration_h : 2;

    // 5th byte
    uint8_t aac_frame_length_m : 8;
    // 6th byte
    uint8_t adts_buffer_fullness_l : 5;
    uint8_t aac_frame_length_h : 3;
    // 7th byte
    uint8_t number_of_raw_data_blocks_in_frame : 2;
    uint8_t adts_buffer_fullness_h : 6;  // adts_buffer_fullness 0x7ff VBR

    ADTS_Header()
    {
        memset(this, 0, sizeof(ADTS_Header));
        setSyncWord();
        protection_absent = 1;
        profile = 1;
    }

    ADTS_Header(int samplingFreq, int channel, int length)
    {
        memset(this, 0, sizeof(ADTS_Header));
        setSyncWord();
        setSamplingFrequency(samplingFreq);
        setChannel(channel);
        setLength(length);
        protection_absent = 1;
        profile = 1;
    }

    ADTS_Header &setSyncWord()
    {
        sync_word_l = 0xff;
        sync_word_h = 0xf;
        return *this;
    }

    ADTS_Header &setSamplingFrequency(int sf)
    {
        int sampling_frequency_table[13] = {96000, 88200, 64000, 48000, 44100, 32000, 24000,
                                            22050, 16000, 12000, 11025, 8000,  7350};
        sampling_frequency_index = 0xf;
        for (int i = 0; i < 13; ++i) {
            if (sampling_frequency_table[i] == sf) {
                sampling_frequency_index = i;
                break;
            }
        }
        return *this;
    }

    ADTS_Header &setChannel(int ch)
    {
        if (ch > 0 && ch < 7)
            channel_configuration_h = ch;
        else if (ch == 8)
            channel_configuration_h = 7 >> 24;
        return *this;
    }

    // length = header length + aac es stream length
    ADTS_Header &setLength(int length)
    {
        aac_frame_length_l = (length >> 11) & 0x03;
        aac_frame_length_m = (length >> 3) & 0xff;
        aac_frame_length_h = (length & 0x07);
        return *this;
    }

    int getLength()
    {
        int l = (aac_frame_length_l << 11) | (aac_frame_length_m << 3) | (aac_frame_length_h);
        return l;
    }

    int getChannel()
    {
        if (channel_configuration_h == 8)
            return 8;
        return channel_configuration_h;
    }
};

#endif  // _ADTSHEADER_H

When saving a file, write an ADTS Header before each frame to play it.

ADTS_Header header(48000, 2, avPacket->size + 7);  //Assume 2-channel audio at 48K sampling rate
fwrite(&header, 1, sizeof(ADTS_Header), _file);
fwrite(avPacket->data, 1, avPacket->size, _file);

Remind me that one frame encoded by AAC is 1024 sampling points. There is no need for timestamp information when saving the file. When playing, it will be played according to the sampling rate.

Posted by homer.favenir on Sat, 04 Dec 2021 22:18:39 -0800