Like other coding formats, AAC is only a data coding format. The code stream organization formats include ADIF(Audio Data Interchange Format) and ADTS (Audio Data Transport Stream).
The significant difference between ADIF and ADTS is that the coding information of the former has a fixed place, and the coding information of the latter is contained in every packet. Therefore, ADIF is mainly used to store files on disk, and ADTS is mainly used for network flow of progressive transmission. This paper mainly analyzes ADTS flow.
ADTS stream format
ADTS stream format is ADTS header plus AAC raw data.
[ADTS Header](AAC ES data) | [ADTS Header](AAC ES data) | ...
ADTS Header is a fixed length of 7 bytes. The format can be represented by the following letter sequence. One letter represents a field, and the number represents the bit length.
AAAAAAAA AAAABCCD EEFFFFGH HHIJKLMM MMMMMMMM MMMOOOOO OOOOOOPP (QQQQQQQQ QQQQQQQQ)
A-J is called ADTS fixed head and K-Q is called variable head.
sign | Bit length | describe |
---|---|---|
A | 12 | Sync field, all 1 |
B | 1 | MPEG version: 0 for MPEG-4, 1 for MPEG-2, mp4 is 0 |
C | 2 | Layer: all 0 |
D | 1 | Protection missing ID protectionabsent, 1 indicates no CRC, 0 indicates CRC |
E | 2 | AAC encoding level, MPEG-4 Audio Object Type minus 1. 0: Main Profile, 1:LC, 2: SSR, 3: reserved. Low complexity coding LC is commonly used. |
F | 4 | MPEG-4 sampling rate table serial number. Note that this is the serial number, not the sampling rate value. Refer to the sampling rate table. |
G | 1 | Private bit, set to 0, ignored during decoding |
H | 3 | Number of channels, value range 1-7. |
I | 1 | Source ID, code set to 0, decoding ignored |
J | 1 | home, the encoding is set to 0, and decoding is ignored |
K | 1 | Copyright flag bit, encoding set to 0, decoding ignored |
L | 1 | The start bit of the copyright flag, the encoding is set to 0, and the decoding is ignored |
M | 13 | Frame length: AAC raw data length + ADTS header length (protectionabsent = = 1? 7: 9) |
O | 11 | Buffer fullness, 0x7FF indicates that it is a code stream with variable code rate |
P | 2 | The number of AAC frames minus 1. When there are 1 frames, this value is 0 |
Q | 16 | If the protection missing ID ProtectionAbsent is 0, the ID has a 2-byte CRC check field |
The original drawing of the official document is
The sampling rate table is shown below.
Method of parsing and saving aac code stream in mp4 by FFmpeg
FFmpeg uses av_read_frame(AVFormatContext *s, AVPacket *pkt); The avpacket data read by the function from the mp4 file is AAC raw data. If it is directly saved as a file, it cannot be played because there is no sampling rate and other information. According to the above description of ADTS Header structure, each frame is filled with one Header information.
The following is the ADTS header definition I use for reference. Note that the bitwise definitions in a single byte are in reverse order and may not be easy to understand.
ADTSHeader.h
#ifndef _ADTSHEADER_H #define _ADTSHEADER_H #include <string.h> // 7 bytes struct ADTS_Header { // fixed header // 1 byte uint8_t sync_word_l : 8; // 0xFF // 2 byte // sync_word_h + id + layer + protection_absent uint8_t protection_absent : 1; // 1 no CRC, 0 has CRC uint8_t layer : 2; // 00 uint8_t id : 1; // 0: MPEG-4, 1: MPEG-2 uint8_t sync_word_h : 4; // 0xF // 3rd byte // profile + sampling_frequency_index + private_bit + channel_configuration_l uint8_t channel_configuration_l : 1; uint8_t private_bit : 1; uint8_t sampling_frequency_index : 4; uint8_t profile : 2; // 0:main, 1: LC, 2: SSR, 3: reserved // 4th byte uint8_t aac_frame_length_l : 2; uint8_t copyright_identification_start : 1; uint8_t copyright_identification_bit : 1; uint8_t home : 1; uint8_t original_copy : 1; uint8_t channel_configuration_h : 2; // 5th byte uint8_t aac_frame_length_m : 8; // 6th byte uint8_t adts_buffer_fullness_l : 5; uint8_t aac_frame_length_h : 3; // 7th byte uint8_t number_of_raw_data_blocks_in_frame : 2; uint8_t adts_buffer_fullness_h : 6; // adts_buffer_fullness 0x7ff VBR ADTS_Header() { memset(this, 0, sizeof(ADTS_Header)); setSyncWord(); protection_absent = 1; profile = 1; } ADTS_Header(int samplingFreq, int channel, int length) { memset(this, 0, sizeof(ADTS_Header)); setSyncWord(); setSamplingFrequency(samplingFreq); setChannel(channel); setLength(length); protection_absent = 1; profile = 1; } ADTS_Header &setSyncWord() { sync_word_l = 0xff; sync_word_h = 0xf; return *this; } ADTS_Header &setSamplingFrequency(int sf) { int sampling_frequency_table[13] = {96000, 88200, 64000, 48000, 44100, 32000, 24000, 22050, 16000, 12000, 11025, 8000, 7350}; sampling_frequency_index = 0xf; for (int i = 0; i < 13; ++i) { if (sampling_frequency_table[i] == sf) { sampling_frequency_index = i; break; } } return *this; } ADTS_Header &setChannel(int ch) { if (ch > 0 && ch < 7) channel_configuration_h = ch; else if (ch == 8) channel_configuration_h = 7 >> 24; return *this; } // length = header length + aac es stream length ADTS_Header &setLength(int length) { aac_frame_length_l = (length >> 11) & 0x03; aac_frame_length_m = (length >> 3) & 0xff; aac_frame_length_h = (length & 0x07); return *this; } int getLength() { int l = (aac_frame_length_l << 11) | (aac_frame_length_m << 3) | (aac_frame_length_h); return l; } int getChannel() { if (channel_configuration_h == 8) return 8; return channel_configuration_h; } }; #endif // _ADTSHEADER_H
When saving a file, write an ADTS Header before each frame to play it.
ADTS_Header header(48000, 2, avPacket->size + 7); //Assume 2-channel audio at 48K sampling rate fwrite(&header, 1, sizeof(ADTS_Header), _file); fwrite(avPacket->data, 1, avPacket->size, _file);
Remind me that one frame encoded by AAC is 1024 sampling points. There is no need for timestamp information when saving the file. When playing, it will be played according to the sampling rate.