英文原文地址:http://www.webmproject.org/docs/container/#
一、综述
WebM是一种数字多媒体容器文件格式,由WebM Project提出,它包含了Matroska容器文件格式标准。
二、目标
1、短期目标
(1)选择一种适合于VP8编码格式的容器格式,使开放的网络更完美。
(2)使网络内容提供者更容易创建和发布VP8视频。
2、长期目标
培养这种开源格式的普及性,使得用户可以任何地方都能很方便的使用。
三、命名
容器格式名称 |
WebM |
只包含Audio的MIME类型 |
audio/webm |
文件扩展名 |
.webm |
Video编码格式 |
VP8 |
MIME类型 |
video/webm |
Audio编码格式 |
Vorbis |
四、HTML5视频类型参数
1、视频编码格式
VP8:vp8.X的含义是vp8编码格式,bitstream的版本为X。目前VP8的bitstream的版本只有0,FURCC为VP8X,正好与vp8.X相匹配。
2、音频编码格式
Vorbis:只包含音频格式的文件的MIME为"audio/webm"。
3、canPlayType函数
(1)canPlayType('video/webm') should return maybe
(2)canPlayType('audio/webm') should return maybe
(3)canPlayType('video/webm; codecs="vp8, vorbis"') should return probably
(4)canPlayType('video/webm; codecs="vp8.0, vorbis"') should return probably
(5)canPlayType('audio/webm; codecs="vorbis"') should return probably
五、智能客户端
主要目标之一就是让内容创作者具有更高级的回放能力,例如仅仅使用一个HTTP服务器就可以实现快速seek和快速start。因此,为了实现这个目标,内容创作者应该遵循下面的guidelines。
六、WebM Guidelines
这些guidelines目前是为了基于HTTP连接的文件流,并指出哪些地方是与Matraska Specification相关的。
1、Demuxer和Muxer Guidelines
(1)DocType element应该是"webm"
(2)视频编码格式应该是VP8,CodecID是"V_VP8",而且没有CodecPrivate数据
(3)音频编码格式应该是Vorbis,WebM Project将开发出一份详细的指导,关于如何在WebM格式中开发Vorbis,包括profile, bitrate, channels等。
(4)最初的WebM格式不支持Subtitles。在不久的将来,WHATWG/W3C RFC将发布关于Subtitles和overlays的指导,在HTML5 <video>中。
(5)DocReadTypeVersion应该遵循Matraska Specification,例如包含v2 element的文件的DocReadTypeVersion应该是2.
2、Muxer Guidelines
(1)WebM文件应该包含SeekHead element,这样可以让客户端知道文件包含Cues element。
(2)WebM文件应该包含一个只有keyframe的Cues element。Cues element应该只包含视频关键帧,这样可以降低文件头的尺寸。建议将Cues element放在所有的Clusters之前,这样客户端可以在seek操作中可以seek到一个point,而且这个point还没有下载。
(3)所有的timecodes的绝对值(block+cluster)都必须是单调递增的。所有的timecodes都与block的起始时间相关。
(4)TimecodeScale应该设置成一个默认值:1000000纳秒。这样每个Cluster的block的数值可以上升到32767秒。
(5)关键帧应该放在cluster的开始位置。将关键帧放在cluster的开始位置,可以让客户端更容易、更快的实现seek操作。
(6)包含视频关键帧的timecode的音频blocks应该和这个视频关键帧在同一个cluster中。
(7)和视频blocks的绝对timecode相同的音频blocks,应该放在视频blocks的前面。
(8)WebM文件必须紧紧支持pixels for the DisplayUnit element。
(9)VP8 frames应该在SimpleBlock element中mux。
3、VP8交替参考帧(Alternate Reference Frames)
如果使能了交替参考帧,VP8编码器会在输出的时候插入一个交替参考帧(AR),在依赖于这个交替参考帧的帧的前面。在I/P帧之间最多加一个帧。这依赖帧(D)一般为P帧。交替参考帧将会被codec SDK标记为不可见,而且必选在依赖帧之前解码,但不会输出。
AR的时间戳的值应设置的尽量接近前一帧的时间戳值。
编码示例:
Input |
F0 |
|
F1 |
Output |
I/P |
AR |
D |
PTS |
0 |
1 |
2 |
解码示例:
Input |
I/P |
AR |
D |
Output |
F0 |
|
F1 |
PTS |
0 |
|
2 |
4、Demuxer Guidelines
(1)Demuxer必须只能打开DocType为"webm"的文件。
(2)一旦Demuxer认定WebM文件的header和metadata是有效的,而且播放器开始播放该文件了,那么Demuxer必须尽最大的努力解析该文件,以保证回放尽可能的正确进行。
(3)WebM 文件如果没有Cues element,如果实现seek操作。不过WebM Project正在考虑在没有Cues element情况下支持seek。
(4)如果Cues element不是放在文件的开始位置,那么检索会被推迟,以保证回放尽快开始。
5、WebVTT Guidelines
(1)在WebM track中保存WebVTT数据
WebVTT文件的内容作为WebM文件的一个track存在。将作为HTML5 track tag的属性而出现的信息可以被嵌入在WebM的track element中,如下:
a、WebVTT SUBTITLES和CAPTIONS的TrackType sub-element的值是0x11,WebVTT DESCRIPTIONS和METADATA的是0x21。
b、Label属性存储在Name sub-element。
c、srclang属性存储在Language sub-element。
WebVTT的CodecID是"D_WEBVTT/kind",其中kind是SUBTITLES、CAPTIONSD、ESCRIPTIONS或METADATA中的一个。
(2)在WebM Block中保存WebVTT cue
WebM cues作为Block element的data,按照下面描述的格式保存在track中。所有的WebVTT数据必须编码成UTF-8格式然后保存在一个WebM block中。这个block的时间戳和duration通过WebVTT Cue的起始时间和结束时间来获得。
如果WebVTT Cue包含WebVTT cue identifier,这个WebVTT cue identifier会被写到WebM block中,紧接着便是WebVTT line terminator。如果WebVTT Cue没有WebVTT cue identifier,那么WebVTT line terminator会被写到该block中。将使用一个空行来区分原始的WebVTT Cue没有WebVTT cue identifier。
WebVTT cue timings不会被保存在WebM block中,WebVTT cue的起始时间和结束时间会通过WebM Block的起始时间和duration来合成。
如果WebVTT Cue包含 WebVTT cue settings,这个 WebVTT cue settings会被写到WebM block中,紧接着便是WebVTT line terminator。如果WebVTT Cue没有WebVTT cue settings,那么WebVTT line terminator会被写到该block中。
然后,将WebVTT cue paylaod写到该block中。
(3)WebVTT Chapter cues
WebVTT Chapter cues用于navigation,因此会采用不同的处理方式,因为他们必须放在一起而且可以立即获得。因为这个原因,WebVTT chapter cues 不应该镶嵌的和timed cues一样,相反,他们应该被转化成Matroska chapters,并且使用那种镶嵌方式。 Matroska chapters 是WebVTT chapter cues的子集,因此转化是无损的。
七、实现细节
在最初的版本中,WebM支持Matroska Specification的一部分标准,Matroska的其他功能还在考虑之中。目前支持的elements以及相应的描述如下所示;
1、EBML Basics
|
Element Name |
Description |
Supported |
EBML |
Top-level element,包含文件描述信息。 |
Supported |
EBMLVersion |
用来创建文件的EBML parser的版本。 |
Supported |
EBMLReadVersion |
具有读文件功能的EBML parser的最小版本。 |
Supported |
EBMLMaxIDLength |
文件中ID的最大长度(在Matroska文件中为4 或更小)。 |
Supported |
EBMLMaxSizeLength |
文件中Size的最大长度 (在Matroska文件中为8 或更小)。如果某个element的size比EBMLMaxSizeLength大,将被认为是无效的。 |
Supported |
DocType |
描述EBML header后面的document的类型(这里是‘webm’)。 |
Supported |
DocTypeVersion |
用来创建文件的DocType interpreter的版本。 |
Supported |
DocTypeReadVersion |
具有读文件功能的DocType interpreter的最小版本。 |
2、Global Elements (贯穿整个文件格式)
|
Element Name |
Description |
Unsupported |
CRC-32 |
所有的level 1 elements 应该包含一个CRC-32。建议将CRC放在Master element的开始位置,为了更容易的读取。 |
Supported |
Void |
用来使损坏的数据无效, 以避免出现意外的行为。也可以用来预留空间,在一个sub-element中,以备后面使用。 |
|
Signature Start |
|
Unsupported |
SignatureSlot |
包含数据流中一些elements的签名。 |
Unsupported |
SignatureAlgo |
使用的签名算法 (1=RSA, 2=elliptic)。 |
Unsupported |
SignatureHash |
使用的哈希算法 (1=SHA1-160, 2=MD5)。 |
Unsupported |
SignaturePublicKey |
使用算法的公共密钥 (在基于PKI签名的情况下)。 |
Unsupported |
Signature |
数据的签名。 |
Unsupported |
SignatureElements |
包含用来计算签名的elements。 |
Unsupported |
SignatureElementList |
由连贯的elements组成的一个list。 |
Unsupported |
SignedElement |
一个element ID,它的数据将被用来计算签名。 |
|
Signature End |
|
3、Segment
|
Element Name |
Description |
Supported |
Segment |
这个element包含所有其他的top-level (level 1) elements。 典型的 Matroska文件只包含一个Segment element。 |
4、Meta Seek Information
|
Element Name |
Description |
Supported |
SeekHead |
包含其他level 1 elements的位置。 |
Supported |
Seek |
包含一个指向一个EBML element的seek entry。 |
Supported |
SeekID |
与element name一致的ID。 |
Supported |
SeekPosition |
Segment中element的位置 (0 = 第一个level 1 element)。 |
5、Segment Information
|
Element Name |
Description |
Supported |
Info |
Contains miscellaneous general information and statistics on the file. |
Unsupported |
SegmentUID |
A randomly generated unique ID to identify the current segment between many others (128 bits). |
Unsupported |
SegmentFilename |
A filename corresponding to this segment. |
Unsupported |
PrevUID |
A unique ID to identify the previous chained segment (128 bits). |
Unsupported |
PrevFilename |
An escaped filename corresponding to the previous segment. |
Unsupported |
NextUID |
A unique ID to identify the next chained segment (128 bits). |
Unsupported |
NextFilename |
An escaped filename corresponding to the next segment. |
Unsupported |
SegmentFamily |
A randomly generated unique ID that all segments related to each other must use (128 bits). |
Unsupported |
ChapterTranslate |
A tuple of corresponding ID used by chapter codecs to represent this segment. |
Unsupported |
ChapterTranslateEditionUID |
Specify an edition UID on which this correspondance applies. When not specified, it means for all editions found in the segment. |
Unsupported |
ChapterTranslateCodec |
The chapter codec using this ID (0: Matroska Script, 1: DVD-menu). |
Unsupported |
ChapterTranslateID |
The binary value used to represent this segment in the chapter codec data. The format depends on the ChapProcessCodecID used. |
Supported |
TimecodeScale |
Timecode scale in nanoseconds (1.000.000 means all timecodes in the segment are expressed in milliseconds). |
Supported |
Duration |
Duration of the segment (based on TimecodeScale). |
Supported |
DateUTC |
Date of the origin of timecode (value 0), i.e. production date. |
Supported |
Title |
General name of the segment. |
Supported |
MuxingApp |
Muxing application or library (“libmatroska-0.4.3”). |
Supported |
WritingApp |
Writing application (“mkvmerge-0.3.3”). |
6、Cluster
|
Element Name |
Description |
Supported |
Cluster |
The lower level element containing the (monolithic) Block structure. |
Supported |
Timecode |
Absolute timecode of the cluster (based on TimecodeScale). |
Unsupported |
SilentTracks |
The list of tracks that are not used in that part of the stream. It is useful when using overlay tracks on seeking. Then you should decide what track to use. |
Unsupported |
SilentTrackNumber |
One of the track number that are not used from now on in the stream. It could change later if not specified as silent in a further Cluster. |
Unsupported |
Position |
Position of the Cluster in the segment (0 in live broadcast streams). It might help to resynchronise offset on damaged streams. |
Supported |
PrevSize |
Size of the previous Cluster, in octets. Can be useful for backward playing. |
Supported |
BlockGroup |
Basic container of information containing a single Block or BlockVirtual, and information specific to that Block/VirtualBlock. |
Supported |
Block |
Block containing the actual data to be rendered and a timecode relative to the Cluster Timecode. |
Unsupported |
BlockVirtual |
A Block with no data. It must be stored in the stream at the place the real Block should be in display order. |
Unsupported |
BlockAdditions |
Contain additional blocks to complete the main one. An EBML parser that has no knowledge of the Block structure could still see and use/skip these data. |
Unsupported |
BlockMore |
Contain the BlockAdditional and some parameters. |
Unsupported |
BlockAddID |
An ID to identify the BlockAdditional level. |
Unsupported |
BlockAdditional |
Interpreted by the codec as it wishes (using the BlockAddID). |
Supported |
BlockDuration |
The duration of the Block (based on TimecodeScale). This element is mandatory when DefaultDuration is set for the track. When not written and with no DefaultDuration, the value is assumed to be the difference between the timecode of this Block and the timecode of the next Block in “display” order (not coding order). This element can be useful at the end of a Track (as there is not other Block available), or when there is a break in a track like for subtitle tracks. |
Unsupported |
ReferencePriority |
This frame is referenced and has the specified cache priority. In cache only a frame of the same or higher priority can replace this frame. A value of 0 means the frame is not referenced. |
Supported |
ReferenceBlock |
Timecode of another frame used as a reference (ie: B or P frame). The timecode is relative to the block it’s attached to. |
Unsupported |
ReferenceVirtual |
Relative position of the data that should be in position of the virtual block. |
Unsupported |
CodecState |
The new codec state to use. Data interpretation is private to the codec. This information should always be referenced by a seek entry. |
Unsupported |
Slices |
Contains slices description. |
Unsupported |
TimeSlice |
Contains extra time information about the data contained in the Block. While there are a few files in the wild with this element, it is no longer in use and has been deprecated. Being able to interpret this element is not required for playback. |
Supported |
LaceNumber |
The reverse number of the frame in the lace (0 is the last frame, 1 is the next to last, etc). While there are a few files in the wild with this element, it is no longer in use and has been deprecated. Being able to interpret this element is not required for playback. |
Unsupported |
FrameNumber |
The number of the frame to generate from this lace with this delay (allow you to generate many frames from the same Block/Frame). |
Unsupported |
BlockAdditionID |
The ID of the BlockAdditional element (0 is the main Block). |
Unsupported |
Delay |
The (scaled) delay to apply to the element. |
Unsupported |
Duration |
The (scaled) duration to apply to the element. |
Supported |
SimpleBlock |
Similar to Block but without all the extra information, mostly used to reduced overhead when no extra feature is needed. |
Unsupported |
EncryptedBlock |
Similar to SimpleBlock but the data inside the Block are Transformed (encrypt and/or signed). |
7、Track
Element Name |
Description |
|
Supported |
Tracks |
Top-level element。 |
Supported |
TrackEntry |
描述一个track。 |
Supported |
TrackNumber |
Block Header中使用的track号 (不建议超过127个tracks,尽管设计的初衷没有限制)。 |
Supported |
TrackUID |
识别Track的UID,不能为0。ss |
Supported |
TrackType |
Track类型,8位(1: video, 2: audio, 3: complex, 0x10: logo, 0x11: subtitle, 0x12: buttons, 0x20: control)。 |
Supported |
FlagEnabled |
Track使能标记。 |
Supported |
FlagDefault |
设置为1,表示该track应该被默认选择。 |
Supported |
FlagForced |
设置为1,表示该track必须被播放。 如果多个track的FlagForced设置为1,那么播放器将选择一个language与用户设置匹配的track。 |
Supported |
FlagLacing |
如果track可能包含使用lacing的Blocks,设置为1。 |
Unsupported |
MinCache |
在播放过程中要求播放器能够缓存最少的frame的个数。 |
Unsupported |
MaxCache |
在播放过程中要求播放器能够缓存最少的frame的个数,0 表示没有缓存。 |
Supported |
DefaultDuration |
每个frame持续的时间,以纳秒为单位。 |
Unsupported |
TrackTimecodeScale |
Block的timecode乘以这个值获得实际的timecode值。通常用来调节视频速度。 |
Unsupported |
TrackOffset |
可以与Block Timecode相加。 通常用来调节track的播放位置。 |
Unsupported |
MaxBlockAdditionID |
BlockAddID的最大值。0意味着该track没有BlockAdditions。 |
Supported |
Name |
Track名称。 |
Supported |
Language |
用Matroska 语言形式指定track的语言。 |
Supported |
CodecID |
Codec的ID。 |
Supported |
CodecPrivate |
Codec的private data。 |
Supported |
CodecName |
Codec的名称。 |
Unsupported |
AttachmentLink |
Codec使用的attachment的UID。 |
Unsupported |
CodecSettings |
字符串,用来描述编码设置。 |
Unsupported |
CodecInfoURL |
用来查找关于Codec信息的URL。 |
Unsupported |
CodecDownloadURL |
用来下载关于Codec的URL。 |
Unsupported |
CodecDecodeAll |
The codec can decode potentially damaged data. |
Unsupported |
TrackOverlay |
Specify that this track is an overlay track for the Track specified (in the u-integer). That means when this track has a gap (see SilentTracks) the overlay track should be used instead. The order of multiple TrackOverlay matters, the first one is the one that should be used. If not found it should be the second, etc. |
Unsupported |
TrackTranslate |
The track identification for the given Chapter Codec. |
Unsupported |
TrackTranslateEditionUID |
Specify an edition UID on which this translation applies. When not specified, it means for all editions found in the segment. |
Unsupported |
TrackTranslateCodec |
The chapter codec using this ID (0: Matroska Script, 1: DVD-menu). |
Unsupported |
TrackTranslateTrackID |
The binary value used to represent this track in the chapter codec data. The format depends on the ChapProcessCodecID used. |
Unsupported |
Video Start |
|
Supported |
Video |
Video属性设置。 |
Supported |
FlagInterlaced |
如果Video是interlace的,设置为1。 |
Supported |
StereoMode |
Stereo-3D video mode. 支持的Modes: 0: mono, 1: side by side (left eye is first), 2: top-bottom (right eye is first), 3: top-bottom (left eye is first), 11: side by side (right eye is first) 不支持的Modes: 4: checkboard (right is first), 5: checkboard (left is first), 6: row interleaved (right is first), 7: row interleaved (left is first), 8: column interleaved (right is first), 9: column interleaved (left is first), 10: anaglyph (cyan/red) |
Supported |
PixelWidth |
编码的Video Frame的width,以pixel为单位。 |
Supported |
PixelHeight |
编码的Video Frame的height,以pixel为单位。 |
Supported |
PixelCropBottom |
从bottom remove pixel的个数。 |
Supported |
PixelCropTop |
从top remove pixel的个数。 |
Supported |
PixelCropLeft |
从left remove pixel的个数。 |
Supported |
PixelCropRight |
从right remove pixel的个数。 |
Supported |
DisplayWidth |
显示的Video Frame的width。 |
Supported |
DisplayHeight |
显示的Video Frame的height。 |
Supported |
DisplayUnit |
DisplayWidth/Height的uint类型 (0: pixels, 1: centimeters, 2: inches). 目前仅仅支持pixel。 |
Supported |
AspectRatioType |
指定高宽比例可能的改变(0: free resizing, 1: keep aspect ratio, 2: fixed)。 |
Unsupported |
ColourSpace |
和AVI文件的一样 (32 bits)。 |
Unsupported |
GammaValue |
Gamma Value. |
Supported |
FrameRate |
帧率。 |
|
Video End |
|
|
Audio Start |
|
Supported |
Audio |
Audio属性设置。 |
Supported |
SamplingFrequency |
采样频率,以Hz为单位。 |
Supported |
OutputSamplingFrequency |
实际输出采样频率,以Hz为单位(用于SBR 技术)。 |
Supported |
Channels |
Channel个数 |
Unsupported |
ChannelPositions |
Table of horizontal angles for each successive channel, see appendix. |
Supported |
BitDepth |
采样深度,主要用于PCM格式。 |
|
Audio End |
|
|
Content Encoding Start |
|
Unsupported |
All |
All elements about Content Encoding |
|
Content Encoding End |
|
8、Cueing Data
|
Element Name |
Description |
Supported |
Cues |
Top-level element,有助于加速seeking access。 |
Supported |
CuePoint |
包含所有与seek point相关的信息。 |
Supported |
CueTime |
基于segment time base的绝对timecode值。 |
Supported |
CueTrackPositions |
与timecode相关的不同tracks的位置。 |
Supported |
CueTrack |
一个给定位置的track。 |
Supported |
CueClusterPosition |
包含Required Block的Cluster的位置。. |
Supported |
CueBlockNumber |
指定的Cluster中的Block号。 |
Unsupported |
CueCodecState |
与该Cue element相关的Codec State的位置。 0 意味着数据来自一开始的Track Entry。 |
Unsupported |
CueReference |
包含required referenced Blocks的Clusters。 |
Unsupported |
CueRefTime |
Referenced Block的timecode。 |
Unsupported |
CueRefCluster |
包含referenced Block的Cluster的位置。 |
Unsupported |
CueRefNumber |
指定Cluster中Track X中的referenced Block号。 |
Unsupported |
CueRefCodecState |
与该referenced element相关的Codec State的位置。 0意味着数据来自一开始的Track Entry。 |
9、Attachment(Unsupported)
10、Chapters(Unsupported)
11、Tagging
(1)Tags element应该放在文件的末尾,以方便不重要的、琐碎的升级。
(2)TagName应该放在Tag data之前。
(3)SimpleTag不应该包含其他的SimpleTag。
(4)当连接多个文件的时候,应避免Tag在不同文件之间的merge。
|
Element Name |
Description |
Supported |
Tags |
Element containing elements specific to Tracks/Chapters. |
Supported |
Tag |
Element containing elements specific to Tracks/Chapters. |
Supported |
Targets |
Contain all UIDs where the specified meta data apply. It is void to describe everything in the segment. |
Supported |
TargetTypeValue |
A number to indicate the logical level of the target. |
Supported |
TargetType |
An informational string that can be used to display the logical level of the target like “ALBUM”, “TRACK”, “MOVIE”, “CHAPTER”, etc |
Supported |
TrackUID |
This value SHOULD be 0, meaning the tags apply to all tracks in the Segment. |
Unsupported |
EditionUID |
A unique ID to identify the EditionEntry(s) the tags belong to. If the value is 0 at this level, the tags apply to all editions in the Segment. |
Unsupported |
ChapterUID |
A unique ID to identify the Chapter(s) the tags belong to. If the value is 0 at this level, the tags apply to all chapters in the Segment. |
Unsupported |
AttachmentUID |
A unique ID to identify the Attachment(s) the tags belong to. If the value is 0 at this level, the tags apply to all the attachments in the Segment. |
Supported |
SimpleTag |
Contains general information about the target. |
Supported |
TagName |
The name of the Tag that is going to be stored. |
Supported |
TagLanguage |
Specifies the language of the tag specified. |
Supported |
TagDefault |
Indication to know if this is the default/original language to use for the given tag |
Supported |
TagString |
The value of the Tag. |
Supported |
TagBinary |
The values of the Tag if it is binary. Note that this can not be used in the same SimpleTag as TagString |