首页 > 代码库 > FFmpeg 数据结构解析

FFmpeg 数据结构解析

最近在做媒体盒子的项目,接触到音视频的编解码,于是开始研究FFmpeg的研究之旅。

记得有大牛说过show me your data ,那么就从数据结构开始吧。


先通过下图对各个结构结构有个总体的认识,再具体的分析:



AVCodecContext

这是一个描述编解码器上下文的数据结构,包含了众多编解码器需要的参数信息。

如下列出了部分比较重要的域:

?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
typedefstructAVCodecContext {
  
    ......
  
    /**
     * some codecs need / can use extradata like Huffman tables.
     * mjpeg: Huffman tables
     * rv10: additional flags
     * mpeg4: global headers (they can be in the bitstream or here)
     * The allocated memory should be FF_INPUT_BUFFER_PADDING_SIZE bytes larger
     * than extradata_size to avoid prolems if it is read with the bitstream reader.
     * The bytewise contents of extradata must not depend on the architecture or CPU endianness.
     * - encoding: Set/allocated/freed by libavcodec.
     * - decoding: Set/allocated/freed by user.
     */
    uint8_t *extradata;
    intextradata_size;
    /**
     * This is the fundamental unit of time (in seconds) in terms
     * of which frame timestamps are represented. For fixed-fps content,
     * timebase should be 1/framerate and timestamp increments should be
     * identically 1.
     * - encoding: MUST be set by user.
     * - decoding: Set by libavcodec.
     */
    AVRational time_base;
  
    /* video only */
    /**
     * picture width / height.
     * - encoding: MUST be set by user.
     * - decoding: Set by libavcodec.
     * Note: For compatibility it is possible to set this instead of
     * coded_width/height before decoding.
     */
    intwidth, height;
  
    ......
  
    /* audio only */
    intsample_rate;///< samples per second
    intchannels;   ///< number of audio channels
  
    /**
     * audio sample format
     * - encoding: Set by user.
     * - decoding: Set by libavcodec.
     */
    enumSampleFormat sample_fmt; ///< sample format
  
    /* The following data should not be initialized. */
    /**
     * Samples per packet, initialized when calling ‘init‘.
     */
    intframe_size;
    intframe_number;  ///< audio or video frame number
  
    ......
  
    charcodec_name[32];
    enumAVMediaType codec_type;/* see AVMEDIA_TYPE_xxx */
    enumCodecID codec_id;/* see CODEC_ID_xxx */
  
    /**
     * fourcc (LSB first, so "ABCD" -> (‘D‘<<24) + (‘C‘<<16) + (‘B‘<<8) + ‘A‘).
     * This is used to work around some encoder bugs.
     * A demuxer should set this to what is stored in the field used to identify the codec.
     * If there are multiple such fields in a container then the demuxer should choose the one
     * which maximizes the information about the used codec.
     * If the codec tag field in a container is larger then 32 bits then the demuxer should
     * remap the longer ID to 32 bits with a table or other structure. Alternatively a new
     * extra_codec_tag + size could be added but for this a clear advantage must be demonstrated
     * first.
     * - encoding: Set by user, if not then the default based on codec_id will be used.
     * - decoding: Set by user, will be converted to uppercase by libavcodec during init.
     */
    unsignedintcodec_tag;           
  
    ......
  
    /**
     * Size of the frame reordering buffer in the decoder.
     * For MPEG-2 it is 1 IPB or 0 low delay IP.
     * - encoding: Set by libavcodec.
     * - decoding: Set by libavcodec.
     */
    inthas_b_frames;
  
    /**
     * number of bytes per packet if constant and known or 0
     * Used by some WAV based audio codecs.
     */
    intblock_align;
  
    ......
  
    /**
     * bits per sample/pixel from the demuxer (needed for huffyuv).
     * - encoding: Set by libavcodec.
     * - decoding: Set by user.
     */
     intbits_per_coded_sample; 
  
     ......
  
} AVCodecContext;

如果是单纯使用libavcodec,这部分信息需要调用者进行初始化;如果是使用整个FFMPEG库,这部分信息在调用 avformat_open_input和avformat_find_stream_info的过程中根据文件的头信息及媒体流内的头部信息完成初始 化。其中几个主要域的释义如下:

  1. extradata/extradata_size:这个buffer中存放了解码器可能会用到的额外信息,在av_read_frame中填充。一般来 说,首先,某种具体格式的demuxer在读取格式头信息的时候会填充extradata,其次,如果demuxer没有做这个事情,比如可能在头部压根 儿就没有相关的编解码信息,则相应的parser会继续从已经解复用出来的媒体流中继续寻找。在没有找到任何额外信息的情况下,这个buffer指针为 空。
  2. time_base:
  3. width/height:视频的宽和高。
  4. sample_rate/channels:音频的采样率和信道数目。
  5. sample_fmt: 音频的原始采样格式。
  6. codec_name/codec_type/codec_id/codec_tag:编解码器的信息。


  7. AVStream
  8. 该结构体描述一个媒体流,定义如下:

  9. ?
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    132
    133
    134
    135
    136
    137
    138
    139
    140
    141
    142
    143
    144
    145
    146
    147
    148
    149
    150
    151
    152
    153
    154
    155
    156
    157
    158
    159
    160
    161
    162
    163
    164
    165
    166
    167
    168
    169
    170
    171
    172
    173
    174
    175
    176
    177
    178
    179
    180
    181
    182
    183
    184
    185
    186
    187
    188
    189
    190
    191
    192
    193
    194
    195
    196
    197
    198
    199
    200
    201
    202
    203
    204
    205
    206
    207
    208
    209
    210
    211
    212
    213
    214
    215
    216
    217
    218
    219
    220
    221
    222
    223
    224
    225
    226
    227
    228
    229
    230
    231
    232
    233
    234
    235
    236
    237
    238
    239
    240
    241
    242
    243
    244
    245
    246
    247
    248
    249
    typedefstructAVStream {
        intindex;   /**< stream index in AVFormatContext */
        intid;      /**< format-specific stream ID */
        AVCodecContext *codec;/**< codec context */
        /**
         * Real base framerate of the stream.
         * This is the lowest framerate with which all timestamps can be
         * represented accurately (it is the least common multiple of all
         * framerates in the stream). Note, this value is just a guess!
         * For example, if the time base is 1/90000 and all frames have either
         * approximately 3600 or 1800 timer ticks, then r_frame_rate will be 50/1.
         */
        AVRational r_frame_rate;
      
        ......
      
        /**
         * This is the fundamental unit of time (in seconds) in terms
         * of which frame timestamps are represented. For fixed-fps content,
         * time base should be 1/framerate and timestamp increments should be 1.
         */
        AVRational time_base;
      
        ......
      
        /**
         * Decoding: pts of the first frame of the stream, in stream time base.
         * Only set this if you are absolutely 100% sure that the value you set
         * it to really is the pts of the first frame.
         * This may be undefined (AV_NOPTS_VALUE).
         * @note The ASF header does NOT contain a correct start_time the ASF
         * demuxer must NOT set this.
         */
        int64_t start_time;
        /**
         * Decoding: duration of the stream, in stream time base.
         * If a source file does not specify a duration, but does specify
         * a bitrate, this value will be estimated from bitrate and file size.
         */
        int64_t duration;
      
    #if LIBAVFORMAT_VERSION_INT < (53<<16)
        charlanguage[4];/** ISO 639-2/B 3-letter language code (empty string if undefined) */
    #endif
      
        /* av_read_frame() support */
        enumAVStreamParseType need_parsing;
        structAVCodecParserContext *parser;
      
        ......
      
        /* av_seek_frame() support */
        AVIndexEntry *index_entries;/**< Only used if the format does not
                                        support seeking natively. */
        intnb_index_entries;
        unsignedintindex_entries_allocated_size;
      
        int64_t nb_frames;                ///< number of frames in this stream if known or 0
      
        ......
      
        /**
         * Average framerate
         */
        AVRational avg_frame_rate;
        ......
    } AVStream;
     
    主要域的释义如下,其中大部分域的值可以由avformat_open_input根据文件头的信息确定,缺少的信息需要通过调用avformat_find_stream_info读帧及软解码进一步获取:
     
        index/id:index对应流的索引,这个数字是自动生成的,根据index可以从AVFormatContext::streams表中索引到该流;而id则是流的标识,依赖于具体的容器格式。比如对于MPEG TS格式,id就是pid。
        time_base:流的时间基准,是一个实数,该流中媒体数据的pts和dts都将以这个时间基准为粒度。通常,使用av_rescale/av_rescale_q可以实现不同时间基准的转换。
        start_time:流的起始时间,以流的时间基准为单位,通常是该流中第一个帧的pts。
        duration:流的总时间,以流的时间基准为单位。
        need_parsing:对该流parsing过程的控制域。
        nb_frames:流内的帧数目。
        r_frame_rate/framerate/avg_frame_rate:帧率相关。
        codec:指向该流对应的AVCodecContext结构,调用avformat_open_input时生成。
        parser:指向该流对应的AVCodecParserContext结构,调用avformat_find_stream_info时生成。




     

    AVFormatContext
     
    这个结构体描述了一个媒体文件或媒体流的构成和基本信息。

    定义如下:
     
    typedefstructAVFormatContext {
        constAVClass *av_class;/**< Set by avformat_alloc_context. */
        /* Can only be iformat or oformat, not both at the same time. */
        structAVInputFormat *iformat;
        structAVOutputFormat *oformat;
        void*priv_data;
        ByteIOContext *pb;
        unsignedintnb_streams;
        AVStream *streams[MAX_STREAMS];
        charfilename[1024];/**< input or output filename */
        /* stream info */
        int64_t timestamp;
    #if LIBAVFORMAT_VERSION_INT < (53<<16)
        chartitle[512];
        charauthor[512];
        charcopyright[512];
        charcomment[512];
        charalbum[512];
        intyear; /**< ID3 year, 0 if none */
        inttrack;/**< track number, 0 if none */
        chargenre[32];/**< ID3 genre */
    #endif
      
        intctx_flags;/**< Format-specific flags, see AVFMTCTX_xx */
        /* private data for pts handling (do not modify directly). */
        /** This buffer is only needed when packets were already buffered but
           not decoded, for example to get the codec parameters in MPEG
           streams. */
        structAVPacketList *packet_buffer;
      
        /** Decoding: position of the first frame of the component, in
           AV_TIME_BASE fractional seconds. NEVER set this value directly:
           It is deduced from the AVStream values.  */
        int64_t start_time;
        /** Decoding: duration of the stream, in AV_TIME_BASE fractional
           seconds. Only set this value if you know none of the individual stream
           durations and also dont set any of them. This is deduced from the
           AVStream values if not set.  */
        int64_t duration;
        /** decoding: total file size, 0 if unknown */
        int64_t file_size;
        /** Decoding: total stream bitrate in bit/s, 0 if not
           available. Never set it directly if the file_size and the
           duration are known as FFmpeg can compute it automatically. */
        intbit_rate;
      
        /* av_read_frame() support */
        AVStream *cur_st;
    #if LIBAVFORMAT_VERSION_INT < (53<<16)
        constuint8_t *cur_ptr_deprecated;
        intcur_len_deprecated;
        AVPacket cur_pkt_deprecated;
    #endif
      
        /* av_seek_frame() support */
        int64_t data_offset;/** offset of the first packet */
        intindex_built;
      
        intmux_rate;
        unsignedintpacket_size;
        intpreload;
        intmax_delay;
      
    #define AVFMT_NOOUTPUTLOOP -1
    #define AVFMT_INFINITEOUTPUTLOOP 0
        /** number of times to loop output in formats that support it */
        intloop_output;
      
        intflags;
    #define AVFMT_FLAG_GENPTS       0x0001 ///< Generate missing pts even if it requires parsing future frames.
    #define AVFMT_FLAG_IGNIDX       0x0002 ///< Ignore index.
    #define AVFMT_FLAG_NONBLOCK     0x0004 ///< Do not block when reading packets from input.
    #define AVFMT_FLAG_IGNDTS       0x0008 ///< Ignore DTS on frames that contain both DTS & PTS
    #define AVFMT_FLAG_NOFILLIN     0x0010 ///< Do not infer any values from other values, just return what is stored in the container
    #define AVFMT_FLAG_NOPARSE      0x0020 ///< Do not use AVParsers, you also must set AVFMT_FLAG_NOFILLIN as the fillin code works on frames and no parsing -> no frames. Also seeking to frames can not work if parsing to find frame boundaries has been disabled
    #define AVFMT_FLAG_RTP_HINT     0x0040 ///< Add RTP hinting to the output file
      
        intloop_input;
        /** decoding: size of data to probe; encoding: unused. */
        unsignedintprobesize;
      
        /**
         * Maximum time (in AV_TIME_BASE units) during which the input should
         * be analyzed in avformat_find_stream_info().
         */
        intmax_analyze_duration;
      
        constuint8_t *key;
        intkeylen;
      
        unsignedintnb_programs;
        AVProgram **programs;
      
        /**
         * Forced video codec_id.
         * Demuxing: Set by user.
         */
        enumCodecID video_codec_id;
        /**
         * Forced audio codec_id.
         * Demuxing: Set by user.
         */
        enumCodecID audio_codec_id;
        /**
         * Forced subtitle codec_id.
         * Demuxing: Set by user.
         */
        enumCodecID subtitle_codec_id;
      
        /**
         * Maximum amount of memory in bytes to use for the index of each stream.
         * If the index exceeds this size, entries will be discarded as
         * needed to maintain a smaller size. This can lead to slower or less
         * accurate seeking (depends on demuxer).
         * Demuxers for which a full in-memory index is mandatory will ignore
         * this.
         * muxing  : unused
         * demuxing: set by user
         */
        unsignedintmax_index_size;
      
        /**
         * Maximum amount of memory in bytes to use for buffering frames
         * obtained from realtime capture devices.
         */
        unsignedintmax_picture_buffer;
      
        unsignedintnb_chapters;
        AVChapter **chapters;
      
        /**
         * Flags to enable debugging.
         */
        intdebug;
    #define FF_FDEBUG_TS        0x0001
      
        /**
         * Raw packets from the demuxer, prior to parsing and decoding.
         * This buffer is used for buffering packets until the codec can
         * be identified, as parsing cannot be done without knowing the
         * codec.
         */
        structAVPacketList *raw_packet_buffer;
        structAVPacketList *raw_packet_buffer_end;
      
        structAVPacketList *packet_buffer_end;
      
        AVMetadata *metadata;
      
        /**
         * Remaining size available for raw_packet_buffer, in bytes.
         * NOT PART OF PUBLIC API
         */
    #define RAW_PACKET_BUFFER_SIZE 2500000
        intraw_packet_buffer_remaining_size;
      
        /**
         * Start time of the stream in real world time, in microseconds
         * since the unix epoch (00:00 1st January 1970). That is, pts=0
         * in the stream was captured at this real world time.
         * - encoding: Set by user.
         * - decoding: Unused.
         */
        int64_t start_time_realtime;
    } AVFormatContext;

    它是FFMpeg中最为基本的一个结构,是其他所有结构的根,是一个多媒体文件或流的根本抽象。其中:

        1.nb_streams和streams所表示的AVStream结构指针数组包含了所有内嵌媒体流的描述。
        2.iformat和oformat指向对应的demuxer和muxer指针。
        3.pb则指向一个控制底层数据读写的ByteIOContext结构。
        4.start_time和duration是从streams数组的各个AVStream中推断出的多媒体文件的起始时间和长度,以微妙为单位。

        这个结构由avformat_open_input在内部创建并以缺省值初始化部分成员。但是,如果调用者希望自己创建该结构,则需要显式为该结构的一些成员置缺省值——如果没有缺省值的话,会导致之后的动作产生异常。以下成员需要被关注:

    • probesize
    • mux_rate
    • packet_size
    • flags
    • max_analyze_duration
    • key
    • max_index_size
    • max_picture_buffer
    • max_delay


    AVPacket

    FFMPEG使用AVPacket来暂存解复用之后、解码之前的媒体数据(一个音/视频帧、一个字幕包等)及附加信息(解码时间戳、显示时间戳、时长等)。

  10. 定义如下:

    ?
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    typedefstructAVPacket {
        /**
         * Presentation timestamp in AVStream->time_base units; the time at which
         * the decompressed packet will be presented to the user.
         * Can be AV_NOPTS_VALUE if it is not stored in the file.
         * pts MUST be larger or equal to dts as presentation cannot happen before
         * decompression, unless one wants to view hex dumps. Some formats misuse
         * the terms dts and pts/cts to mean something different. Such timestamps
         * must be converted to true pts/dts before they are stored in AVPacket.
         */
        int64_t pts;
        /**
         * Decompression timestamp in AVStream->time_base units; the time at which
         * the packet is decompressed.
         * Can be AV_NOPTS_VALUE if it is not stored in the file.
         */
        int64_t dts;
        uint8_t *data;
        int  size;
        int  stream_index;
        int  flags;
        /**
         * Duration of this packet in AVStream->time_base units, 0 if unknown.
         * Equals next_pts - this_pts in presentation order.
         */
        int  duration;
        void (*destruct)(structAVPacket *);
        void *priv;
        int64_t pos;                           ///< byte position in stream, -1 if unknown
      
        /**
         * Time difference in AVStream->time_base units from the pts of this
         * packet to the point at which the output from the decoder has converged
         * independent from the availability of previous frames. That is, the
         * frames are virtually identical no matter if decoding started from
         * the very first frame or from this keyframe.
         * Is AV_NOPTS_VALUE if unknown.
         * This field is not the display duration of the current packet.
         *
         * The purpose of this field is to allow seeking in streams that have no
         * keyframes in the conventional sense. It corresponds to the
         * recovery point SEI in H.264 and match_time_delta in NUT. It is also
         * essential for some types of subtitle streams to ensure that all
         * subtitles are correctly displayed after seeking.
         */
        int64_t convergence_duration;
    } AVPacket;

    其中:

    • dts表示解码时间戳,pts表示显示时间戳,它们的单位是所属媒体流的时间基准。
    • stream_index给出所属媒体流的索引;
    • data为数据缓冲区指针,size为长度;
    • duration为数据的时长,也是以所属媒体流的时间基准为单位;
    • pos表示该数据在媒体流中的字节偏移量;
    • destruct为用于释放数据缓冲区的函数指针;
    • flags为标志域,其中,最低为置1表示该数据是一个关键帧。

           AVPacket结构本身只是个容器,它使用data成员引用实际的数据缓冲区。这个缓冲区通常是由av_new_packet创建的,但也可能由 FFMPEG的API创建(如av_read_frame)。当某个AVPacket结构的数据缓冲区不再被使用时,要需要通过调用 av_free_packet释放。av_free_packet调用的是结构体本身的destruct函数,它的值有两种情 况:1)av_destruct_packet_nofree或0;2)av_destruct_packet,其中,情况1)仅仅是将data和 size的值清0而已,情况2)才会真正地释放缓冲区。

             FFMPEG内部使用AVPacket结构建立缓冲区装载数据,同时提供destruct函数,如果FFMPEG打算自己维护缓冲区,则将 destruct设为av_destruct_packet_nofree,用户调用av_free_packet清理缓冲区时并不能够将其释放;如果 FFMPEG打算将该缓冲区彻底交给调用者,则将destruct设为av_destruct_packet,表示它能够被释放。安全起见,如果用户希望 自由地使用一个FFMPEG内部创建的AVPacket结构,最好调用av_dup_packet进行缓冲区的克隆,将其转化为缓冲区能够被释放的 AVPacket,以免对缓冲区的不当占用造成异常错误。av_dup_packet会为destruct指针为 av_destruct_packet_nofree的AVPacket新建一个缓冲区,然后将原缓冲区的数据拷贝至新缓冲区,置data的值为新缓冲区 的地址,同时设destruct指针为av_destruct_packet。


由于,数据结构比较多,先以上几个比较重要的,有机会再继续分析。

下面总结一下,常见的数据结构和函数,方便以后查询。

1. 数据结构:

(1) AVFormatContext

(2) AVOutputFormat

(3) AVInputFormat

(4) AVCodecContext

(5) AVCodec

(6) AVFrame

(7) AVPacket

(8) AVPicture

(9) AVStream

2. 初始化函数:

(1) av_register_all()

(2) avcodec_open()

(3) avcodec_close()

(4) av_open_input_file()

(5) av_find_input_format()

(6) av_find_stream_info()

(7) av_close_input_file()

3. 音视频编解码函数:

(1) avcodec_find_decoder()

(2) avcodec_alloc_frame()

(3) avpicture_get_size()

(4) avpicture_fill()

(5) img_convert()

(6) avcodec_alloc_context()

(7) avcodec_decode_video()

(8) av_free_packet()

(9) av_free()

4. 文件操作:

(1) avnew_steam()

(2) av_read_frame()

(3) av_write_frame()

(4) dump_format()

5. 其他函数:

(1) avpicture_deinterlace()

(2) ImgReSampleContext()


FFmpeg 数据结构解析