OGG音频格式分析

一. OGG音频格式概述

Ogg是一个自由且开放标准的容器格式,由Xiph.Org基金会所维护。Ogg格式并不受到软件专利的限制,并设计用于有效率地流媒体和处理高品质的数字多媒体。

Ogg”意指一种文件格式,可以纳入各式各样自由和开放源代码的编解码器,包含音效、视频、文字(像字幕)与元数据的处理。

Ogg的多媒体框架下,Theora提供有损的图像层面,而通常用音乐导向的Vorbis编解码器作为音效层面。针对语音设计的压缩编解码器Speex和无损的音效压缩编解码器FLACOggPCM也可能作为音效层面使用。

Ogg”这个词汇通常意指Ogg Vorbis此一音频文件格式,也就是将Vorbis编码的音效包含在Ogg的容器中所成的格式。在以往,.ogg此一扩展名曾经被用在任何Ogg支持格式下的内容;但在2007年,Xiph.Org基金会为了向后兼容的考量,提出请求,将.ogg只留给Vorbis格式来使用。Xiph.Org基金会决定创造一些新的扩展名和媒体格式来描述不同类型的内容,像是只包含音效所用的.oga、包含或不含声音的影片(涵盖Theora)所用的.ogv和程序所用的.ogx

OGGVobis(oggVorbis)是一种新的音频压缩格式,类似于MP3等的音乐格式。OggVobis是完全免费、开放和没有专利限制的。OggVorbis文件的扩展名是.OGGOgg文件格式可以不断地进行大小和音质的改良,而不影响旧有的编码器或播放器。OGG Vorbis有一个特点是支持多声道。

二. OGG音频格式剖析

1.        OGG文件的组织形式

OGG是以页(page)为单位将逻辑流组织链接起来,每个页都有pageheaderpagedata。如下图1所示:

A*

B*

C*

..

A#

 

B#

C#

D*

 

 

D#

Bos   bos   bos              eos             eos    eos bos             eos

1 OGG文件的组织形式

上图中的文件链接了两个物理流,ABC三个逻辑流组成一个物理流,逻辑流D单独是一个物理流。一个物理流中的所有逻辑流的bos_page都必须在物理位置上相邻,如图1所示*A**B**C*三个bos_page的位置。   

bosbeginning of stream;   

eosend of stream

   

2.        OGG page页结构

每个页之间相互独立,都包含了各自应有的信息,页的大小是可变的,通常为4K8KB,最大值不能超过65307bytes27255255*255=65307)。页头部格式如图2

 

  0                  8                  16                 24               31

OggS

V

Header_type

Granule_position

 

 

Serial_number

 

Page_sequence

 

CRC_checksum

 

Num_segment

Segment_table

…………………………

…………………………

…………

payload

…………………………

2 OGG页头部结构

1)       页标识:ASCII字符,0x4f  'O'  0x67  'g'   0x67 'g'  0x53 'S'4个字节大小,它标识着一个页的开始。其作用是分离Ogg封装格式还原媒体编码时识别新页的作用。

2)       版本id:一般当前版本默认为01个字节。

3)       Header_type:标识当前的页的类型,1个字节,

0x01:本页媒体编码数据与前一页属于同一个逻辑流的同一个packet,若此位没有设,表示本页是以一个新的packet开始的;

0x02:表示该页为逻辑流的第一页,bos标识,如果此位未设置,那表示不是第一页;

0x04:表示该页位逻辑流的最后一页,eos标识,如果此位未设置,那表示本页不是最后一页。

4)       Granule_position:媒体编码相关的参数信息,8个字节,对于音频流来说,它存储着到本页为止逻辑流在PCM输出中采样码的数目,可以由它来算得时间戳。对于视频流来说,它存储着到本页为止视频帧编码的数目。若此值为-1,那表示截止到本页,逻辑流的packet未结束。(小端)

5)       Serial_number:当前页中的流的id4个字节,它是区分本页所属逻辑流与其他逻辑流的序号,我们可以通过这个值来划分流。(小端)

6)       Page_seguence:本页在逻辑流的序号,4个字节。OGG解码器能据此识别有无页丢失。

7)       CRC_cbecksum:循环冗余校验码校验和,4个字节,包含页的32bit CRC校验和(包括头部零CRC校验和页数据校验),它的产生多项式为:0x04c11db7

8)       Num _segments:给定本页在segment_table域中出现的segement个数,1个字节。其最大值为255.页最大物理尺寸为65307bytes,小于64KB

9)       Segment_table:从字面看它就是一个表,表示着每个segment的长度,取值范围是0~255

segment可以得到packet的值,每个packet的大小是以最后一个不等于255segment结束的,从页头中的segment_table可以得到每个packet长度,举例:如果一组segment依次顺序为FF 45 FF FF FF 40 FF 5 FF FF FF66,那么第一个packet的长度为255+69 = 324,第二个packet大小829,同理。

页头基本上就是由上述的参数组成,由此我们可以得到页头的长度和整个页的长度:

header_size  = 27+Num_segments;byte

page_size = header_size +segment_table中每个segment的大小;

 

3.        OGG封装处理过程(附)

1)       音视频编码在提供给Ogg封装之前是以具有包边界的“Packets”形式呈现的,包边界依赖于具体的编码格式。如图3所示。   

2)       将逻辑流的各个包进行分片segmentation,每片大小固定为255Byte,但包的最后一个segment通常小于255字节。因为packet的大小可以是任意长度,由具体的媒体编码器来决定。   

3)       进行页封装,每页都被加上页头,每页的长度可不等,由具体情况而确定。页头部segment_table域告知了lacing_value”值的大小,即页中最后一个segment的长度(可以为0,或小于255)。一次处理一个packet,此packet被封装成一个或多个page页(page的长度设定了上限,一般为4kB);下一个packet必须用新的page开始封装,由首部字段域header_type_flag的设置规定来表示。   

多个已被页格式封装好的逻辑流(如语音、文本、图片、音频、视频等)按应用要求的时序关系合成物理流。

Logical bitstream with packet boundaries
 -----------------------------------------------------------------
 > |      packet_1            | packet_2         | packet_3 | <
 -----------------------------------------------------------------

                                        |segmentation(logically only)
                    v

packet_1 (5segments)          packet_2 (4segs)    p_3 (2 segs)
     ------------------------------ --------------------------------
 ..  |seg_1|seg_2|seg_3|seg_4|s_5 | |seg_1|seg_2|seg_3|| |seg_1|s_2 |..
     ------------------------------ --------------------------------

                                | page encapsulation
                    v

page_1 (packet_1 data)   page_2 (pket_1data)   page_3 (packet_2 data)
------------------------  ----------------  ------------------------
|H|------------------- |  |H|----------- |  |H|------------------- |
|D||seg_1|seg_2|seg_3| |  |D|seg_4|s_5 | |  |D||seg_1|seg_2|seg_3| | …
|R|------------------- |  |R|----------- |  |R|------------------- |
------------------------  ----------------  ------------------------

|
pages of            |
other    --------|  |
logical         -------
bitstreams      | MUX |
               -------
                  |
                  v

page_1 page_2          page_3
      ------  ------  ------- -----  -------
 …  ||   |  ||   | ||    |  ||  |  ||    |  …
      ------  ------  ------- -----  -------
             physical Ogg bitstream

3 OGG封装流程示意图

4.        OGG Vorbis比特流结构

Vorbis比特流是以三个数据包头开始的。这些头数据包按顺序依次是:The identification headerThe comment header和设置数据包。这些都与解码Vorbis音频文件密切相关的。

1)       数据包头结构

每个数据包都是以同样的头结构开始的:

u [packet_type] : 8 bit value

u 0x76, 0x6f, 0x72, 0x62, 0x69, 0x73: the characters'v','o','r','b','i','s' as six octets

2)       The identification header

The identificationheader identifies the bitstream as Vorbis, Vorbis

version, and the simpleaudio characteristics of the stream such as sample rate and number of channels.

u [vorbis_version] = read 32 bits as unsigned integer

u [audio_channels] = read 8 bit integer as unsigned必须大于0

u [audio_sample_rate] = read 32 bits as unsigned integer必须大于0

u [bitrate_maximum] = read 32 bits as signed integer

u [bitrate_nominal] = read 32 bits as signed integer

u [bitrate_minimum] = read 32 bits as signed integer

u [blocksize_0] = 2 exponent (read 4 bits as unsigned integer)必须小于等于[blocksize_1]

u [blocksize_1] = 2 exponent (read 4 bits as unsigned integer)

u [framing_flag] = read one bit不能为0

 

Thebitrate fields above are used only as hints. The nominal bitrate fieldespecially may be considerably of in purely VBR streams. The fields aremeaningful only when greater than zero.

a)        All three fields set to thesame value implies a fixed rate, or tightly bounded, nearly fixed-ratebitstream

b)       Only nominal set implies a VBRor ABR stream that averages the nominal bitrate

c)        Maximum and or minimum setimplies a VBR bitstream that obeys the bitrate limits

d)       None set indicates the encoderdoes not care to speculate.

3)       The comment header

Thecomment header includes user text comments (\tags") and a vendor stringfor the application/library that produced the bitstream.

Thecomment header is logically a list of eight-bit-clean vectors; the number ofvectors is bounded to 232 .. 1 and the length of each vector is limited to 232.. 1 bytes. The vector length is encoded; the vector contents themselves arenot null terminated. In addition to the vector list, there is a single vectorfor vendor name (also 8 bit clean, length encoded in 32 bits). For example, the1.0 release of libvorbis set the vendor string to \Xiph.Org libVorbis I20020717".

The vector lengths and number of vectors are stored lsbfirst, according to the bit packing conventions of the vorbis codec. However,since data in the comment header is octetaligned,they can simply be read asunaligned 32 bit little endian unsigned integers

 

 The comment vectors are structured similarlyto a UNIX environment variable. That is,comment fields consist of a field nameand a corresponding value and look like:

1 comment[0]="ARTIST=me";

2comment[1]="TITLE=the sound of Vorbis";

The fieldname is case-insensitive and may consist of ASCII 0x20 through 0x7D, 0x3D ('=')excluded. ASCII 0x41 through 0x5A inclusive (characters A-Z) is to beconsidered equivalent to ASCII 0x61 through 0x7A inclusive (characters a-z).Thefield name is immediately followed by ASCII 0x3D ('=');

thisequals sign is used to terminate the field name.0x3D is followed by 8 bit cleanUTF-8 encoded value of the field contents to the end of the field.Field namesBelow is a proposed, minimal list of standard field names with a description ofintended use. No single or group of field names is mandatory; a comment headermay contain one, all or none of the names in this list.

 

u TITLE Track/Work name

u VERSION The version field may be used to differentiate multipleversions of the same track title in a single collection. (e.g. remix info)

u ALBUM The collection name to which this track belongs

u TRACKNUMBER The track number of this piece if part of a specific largercollection or album

u ARTIST The artist generally considered responsible for the work. Inpopular music this is usually the performing band or singer. For classicalmusic it would be the composer.For an audio book it would be the author of theoriginal text.

u PERFORMER The artist(s) who performed the work. In classical musicthis would be the conductor, orchestra, soloists. In an audio book it would bethe actor who did the reading. In popular music this is typically the same asthe ARTIST and is omitted.

u COPYRIGHT Copyright attribution.

u LICENSE License information, eg, 'All Rights Reserved', 'Any UsePermitted'.

u ORGANIZATION Name of the organization producing the track (i.e. the'record label')

u DESCRIPTION A short text description of the contents

u GENRE A short text indication of music genre

u DATE Date the track was recorded

u LOCATION Location where track was recorded

u CONTACT Contact information for the creators or distributors of thetrack. This could be a URL, an email address, the physical address of the producinglabel.

u ISRC International Standard Recording Code for the track; see theISRC intro page for more information on ISRC numbers.

 

Hint: Field names are not required to beunique (occur once) within a comment header. As

an example, assume a track was recorded bythree well know artists; the following is

permissible, and encouraged:

1 ARTIST=Dizzy Gillespie

2 ARTIST=Sonny Rollins

3 ARTIST=Sonny Stitt

4)       Setup Header

The setupheader includes extensive CODEC setup information as well as the complete VQand Hu man codebooks needed for decode.

Thesetup header contains, in order, the lists of codebook configurations,time-domain transform configurations (placeholders in Vorbis I), floorconfigurations, residue configurations,channel mapping configurations and modeconfigurations. It finishes with a framing bit of '1'. 如下图:
OGG音频格式分析_第1张图片

你可能感兴趣的:(音频编程)