1. EBML(Extensible Binary Meta Language)
参考:http://blog.csdn.net/LBO4031/article/details/7591945
http://www.matroska.org/technical/specs/rfc/index.html
mkv是建立在EBML语言基础上的,因此要了解mkv需要先了解EBML语言。
1. EBML元素(EBML elements):
每个EBML文档就是一系列EBML元素的集合。每个EBML基本元素由ID+DATA_SIZE+DATA构成,即:
typedef struct EBML{ vint ID; // EBML-ID vint size; // size of element char[size] data;// data };数据可以是实际的值(VALUE),也可以是另一个EBML元素集合(数据的实际意义由ID指示)。
ID和size都是vint,也即可变长度整数类型,这种类型的定义参考下面。
2. 值的数据类型:
(所有数据都采用大端模式存储)
a. 变长的无符号整型(vint):
这种类型的长度(所占字节数)为: length = 1 + 整数前缀0bit的个数(number_of_leading_zero_bits),比如,有一段数据: 1A 45 DF A3 A3 42 86...
其中, 1A的二进制为: 00011010,按上面规则,前面有3个0,因此length=1+3=4,因此此整数占4个字节,为: 0x1A45DFA3。
b. 有符号/无符号整型:
采用2的补码的符号表示,size为0-8字节,0字节表示该整数值为0。
c. float型:
采用IEEE float,size为0,4,8或者10字节。0字节表示该float值为0.0。
d string型:
PADDING = 0x00 (结束标识符)
STRING = *BYTE*PADDING (即一个字符串由N个字节的数据和N个结束标识符构成,*表示0-N个)
字符串采用UTF-8编码,注意可以有0个字符串结束标识符,字符串长度也可以是0。
e date型:
有符号8字节整数,以ns表示距离新千年开始的时间(2001-01-01 00:00:00 UTC)
f binary型:
二进制型,即没有在EBML层中解释得数据。
2-1. EBML元素解析示例:
任然以1A 45 DF A3 A3 42 86...为例子:
由2.a中已知,第一个vint整数,即ID为0x1A45DFA3。接下来的数为A3,其二进制表示为:10100011, 即length=1+0=1, 因此此整数占1个字节,为A3,其表示的数字为00100011,即0x23=35,因此,size=35;data为从42开始的35个字节。
(注意,在计算ID时,我们不需要把ID所表示的具体数字算出来;但是在计算size时,我们需要把size所表示的具体数字算出来,即把最高位从1改为0,然后计算值)。
3. 语义解释:
有了上面的知识,我们能够把数据从结构上解析为一个个的EBML元素,但无法知道这些元素表示的是什么意思(即语义解释),语义解释由文档类型定义给出。即:
每个元素都有几个在文档类型定义中定义的属性,这些属性是: name, parent, ID, cardinality, value restricitions, deafult value, value type, child order.
a. name(名字):
NAME = [A-Za-z_]1*[A-Za-z_0-9]
name是一个元素的标识符,而且与元素ID一一对应。只有字幕、数字和下划线可以用于name,且不可以数字开头,不区分大小写。
b. value type(值类型):
只通过EBML数据所给的信息无法得知是否值(data)的类型,因此在EBML的DTD(文档类型定义,参见第5部分)中给出元素值的类型 。除2中给出的数据类型以外,一个元素
还可以是"container"类型,这表示这个元素的内容是更多的元素。
c ID:
每个元素必须有一个ID。
d default value(默认值):
每个非容器元素可以被分配一个默认值。在这种情况下,如果未显式指定这种元素的值,那么将采用这个默认值。
e parent:
在层次结构中放置元素,我们需要知道元素的关系信息。在EBML DTD中这由元素的父元素属性(也可能没有这个属性)表示。有两种表示方法:
可运行的父元素的详细列表,或可运行的潜入深度的一般定义。
PARENTS = NAME / (NAME "," PARENTS)
LEVEL = 1*DIGIT *1(".."*DIGIT)
一个元素如果既没有父元素也没有层定义,则这个元素位于EBML文件的顶层。
f cardinality(基数):
一个元素的基数属性表明了一个元素在当前范围可以出现的次数。默认情况下,一个元素最多只可以出现一次,例如:元素Weight被定义为Brick元素的子元素,那么每个Brick元素中可以使用不超过一个Weight元素。基数属性可以改变默认次数。有以下几种基数值:
符号 意义
* >=0
? 0或者1(默认值)
1 1
+ >=1
g child order(子元素顺序):
子元素顺序属性只应用于容器元素。它声明元素的子元素是否必须按被定义的顺序出现。
YES = "yes" / "1"
NO = “no” / "0"
ORDERED = YES / NO
h value restriction(取值范围):
每个元素可以对它的值施加额外的限制。这些限制只是用来在编码的时候确认数据,并且使编码的数据在解码或解析的时候保持一致。不同的元素有不同的限制,且语法不同。
有两种取值范围限制,一是range(范围),表示元素的值被允许的大小。另一个是size,表示元素值的大小在其编码格式中所占的字节数(这意味着一个string型的元素值的大小不需要和字符的长度一样)。
4. 文件类型定义(DTD: Document Ttype Definition)
EBML的文档类型定义(EDTD)是一种基于ASCII的语言,它将第三部分中描述的系统参数和关系以一种人和计算机都可读的方式描述出来。
语法上它由块组成,格式是上不区分空白,不区分大小写,支持C风格的注释(LCOMMENT)和C++风格的注释(BCOMMENT)。
在EDTD的顶层,目前只有三个不同的块被定义:头声明,类型定义和元素定义:
COMMENT = LCOMMENT / BCOMMENT
S = *WSP / (*WSP COMMENT *WSP);// 可选空格
DTD = *(S / HBLOCK /TBLOCK /EBLOCK)
a. 头声明:
头声明是声明什么值应该被加入元素的头,这些值应该和默认值不同。一个DTD中只能有一个头声明块。格式如下:
HBLOCK = "declare" WSP "header" S "{"*(S / STATEMENT)"}"
STATEMENT = NAME S ":=" DEFS S ";"
例如:
declare header{
DocType := "xhtml";
EBMLVersion := "1";
}
b. 类型定义:
类型定义是一种创造更容易记忆的类型名的方法,使得DTD更小且更易读。格式如下:
TBLOCK = "define" S WSP "types" S "{ " *( S / DTYPE )" }"
DTYPE = NAME S ":=" S TYPE S (PROPERTIES S *1";")/";"
类型可以是之前描述的类型,也可以是在文件的前面部分已经定义的NAME:
TYPE = VTYPE / CTYPE
VTYPE = "int" / "uint" / "float" / "string" / "date" / "binary" / NAME
CTYPE = "container" / NAME
如果类型定义没有属性列表,声明以";"结束。如果有属性列表,那么";"是可省略的。
PROPERTIES = "[" S 1*PROPERTY S "]"
PROPERTY = PROP_NAME S ":" S PROP_VALUE S ";"
示例:
crc32 := binary [ size:4; ]
shal := binary [ size:20; ]
bool := uint [ range:0...1; ]
us_printable := binary [range: 32...126; ]
c 元素定义:
元素定义是DTD的真正目的。格式如下:
EBLOCK = "define" WSP "elements" S "{" *(S / ELEMENT ) "}"
DELEMENT = VELEMENT / CELEMENT / "%children;" //(备注:是否应该是ELEMENT??)
简单声明通常用于值元素:
VELEMENT = NAME S ":=" S ID WSP S TYPE S (PROPERTIES S *1";") / ";"
块元素声明只用于表示父子关系:
CELEMENT = NAME S ":=" S ID WSP S "container" S *1PROPERTIES S ("{" *DELEMENT "}") / ";"
5. EBML标准元素:
EBML定义了以小部分可用于任何EBML应用的元素。一个EBML文档必须(MUST)以一个由EBML元素组成的EBML header作为开始。一般来说,针对某个特定的应用,我们可以在
文档类型定义中对EBML元素中的所有元素定义默认值,这样即可不写任何一个字节就能表示出整个header。然而,从下面的内容容易知道,DocType没有默认值,这意味着
至少DocType需要写进EBML data,这实际上也有利于识别不同用于的EBML文档。
a. EBML:
EBML元素是EBML header的一个容器。
EBML := 1a45dfa3 container [card:+;]
a-1. EBMLVersion:
EBMLVersion是文档符合的EBML版本。
EBMLVersion := 4286 uint [def:1; parent:EBML; ]
a-2. EBMLReadVersion:
为读取该EBML文档, 解析器需要支持的最低EBML版本。
EBMLReadVersion := 42f7 uint [def:1; parent: EBML; ]
a-3. EBMLMaxIDWidth:
文档中ID的最大宽度。建议ID宽度不要超过4字节。
EBMLMaxIDWidth := 42f2 uint [def:4; parent:EBML; ]
a-4. EBMLMaxSizeWidth:
文档中使用的SIZE的最大宽度。建议不要超过8字节。
EBMLMaxSizeWidth := 42f3 uint [def:8; parent:EBML; ]
a-5. DocType:
一个标识文档类型的ASCII字符串。
DocType := 4282 binary [range:32...126; parent:EBML; ]
a-6. DocTypeVersion:
文档类型的版本。
DocTypeVersion := 4287 uint [def:1; parent:EBML; ]
a-7. DocTypeReadVersion:
为读取该文档,解释器需要支持的最低的文档类型版本。
DocTypeReadVersion := 4285 uint [def:1; parent: EBML; ]
2. CRC32:
CRC32容器能被放置在任何EBML元素的附近。储存在CRC32Value中的值是其它子元素上执行CRC-32检查值的结果。
CRC32 := c3 container [level:1...; card:*;]{
%children;
CRC32Value := 42fe binary [ size:4; ]
}
3. Void:
Void元素可当做保留数据使用,用于将来的扩展。也用来填充数据被删除后的空间。解析文档时刻简单忽略此元素。
Void := ec binary [ level:1...; card:*; ]
附录:
1. EBML BNF:
EBML = *ELEMENT ELEMENT = ELEMENT_ID SIZE DATA DATA = VALUE / *ELEMENT VINT = ( %b0 VINT 7BIT ) / ( %b1 7BIT ) ; A more annotated but less correct definition of VINT ; ; VINT = VINT_WIDTH VINT_MARKER VINT_DATA ; VINT_WIDTH = *%b0 ; VINT_MARKER = %b1 ; VINT_DATA = VINT_ALIGNMENT VINT_TAIL ; VINT_ALIGNMENT = *BIT ; VINT_TAIL = *BYTE ELEMENT_ID = VINT SIZE = VINT VALUE = INT / UINT / FLOAT / STRING / DATE / BINARY PADDING = %x00 INT = *8BYTE UINT = *8BYTE FLOAT = *1( 4BYTE / 8BYTE / 10BYTE ) STRING = *BYTE *PADDING DATE = 8BYTE BINARY = *BYTE
; NOTE: This BNF is not correct in that it allows for more freedom ; than what is described in the text. That is because a 100% ; correct BNF would be almost unreadable. To be correct CELEMENT ; would be split into one ELEMENT token for every value type, and ; then each and every one of them would have their own PROPERTIES ; definition which points out only the DEF and RANGE for that value ; type. Some other shortcuts are noted in comments. LCOMMENT = "//" *BYTE (CR / LF) ; *BYTE is string without CR/LF BCOMMENT = "/*" *BYTE "*/" ; *BYTE is string without "*/" COMMENT = LCOMMENT / BCOMMENT ; Line comment / Block comment S = *WSP / ( *WSP COMMENT *WSP ) ; Optional white spaces DTD = *( S / HBLOCK / TBLOCK / EBLOCK ) HBLOCK = "declare" S WSP "header" S "{" *(S / STATEMENT) "}" EBLOCK = "define" S WSP "elements" S "{" *(S / DELEMENT) "}" TBLOCK = "define" S WSP "types" S "{" *(S / DTYPE) "}" DELEMENT = VELEMENT / CELEMENT / "%children;" VELEMENT = NAME S ":=" S ID WSP S TYPE S (PROPERTIES S *1";")/";" CELEMENT = NAME S ":=" S ID WSP S "container" S *1PROPERTIES S ("{" *DELEMENT "}")/";" NAME = [A-Za-z_] 1*[A-Za-z_0-9] ID = 1*( 2HEXDIG ) TYPE = VTYPE / CTYPE VTYPE = "int" / "uint" / "float" / "string" / "date" / "binary" / NAME CTYPE = "conainer" / NAME PROPERTIES = "[" S 1*PROPERTY S "]" PROPERTY = PARENT / LEVEL / CARD / DEF / RANGE / SIZE PARENT = "parent" S ":" S PARENTS S ";" PARENTS = NAME / ( NAME S "," S PARENTS ) LEVEL = "level" S ":" S 1*DIGIT *(".." *DIGIT) S ";" CARD = "card" S ":" S ( "*" / "?" / "1" / "+" ) S ";" ORDERED = "ordered" S ":" S ( YES / NO ) S ";" YES = "yes" / "1" NO = "no" / "0" DEF = "def" S ":" S DEFS S ";" DEFS = ( INT_DEF / UINT_DEF / FLOAT_DEF / STRING_DEF / DATE_DEF / BINARY_DEF / NAME ) RANGE = "range" S ":" S RANGE_LIST S ";" RANGE_LIST = RANGE_ITEM / ( RANGE_ITEM S "," S RANGE_LIST ) RANGE_ITEM = INT_RANGE / UINT_RANGE / FLOAT_RANGE / STRING_RANGE / DATE_RANGE / BINARY_RANGE SIZE = "size" S ":" S SIZE_LIST S ";" SIZE_LIST = UINT_RANGE / ( UINT_RANGE S "," S SIZE_LIST ) ; Actual values, but INT_VALUE is too long. INT_V = *1"-" 1*DIGIT FLOAT_V = INT "." 1*DIGIT *1( "e" *1( "+"/"-" ) 1*DIGIT ) ; DATE uses ISO short format, yyyymmddThh:mm:ss.f DATE_V = *1DIGIT 2DIGIT 2DIGIT *1(%x54 2DIGIT ":" 2DIGIT ":" 2DIGIT *1( "." *1DIGIT )) INT_DEF = INT_V UINT_DEF = 1*DIGIT FLOAT_DEF = FLOAT_V DATE_DEF = INT_DEF / DATE_V STRING_DEF = ("0x" 1*( 2HEXDIG )) / ( %x22 *(%x20-7e) %x22 ) BINARY_DEF = STRING_DEF INT_RANGE = INT_V / ( INT_V ".." ) / ( ".." INT_V ) / ( INT_V ".." INT_V ) UINT_RANGE = 1*DIGIT *1( ".." *DIGIT ) FLOAT_RANGE = ( ("<" / "<=" / ">" / ">=") FLOAT_DEF ) / ( FLOAT_DEF "<"/"<=" ".." "<"/"<=" FLOAT_DEF ) DATE_RANGE = (1*DIGIT / DATE_V) *1( ".." *(DIGIT / DATE_V) ) BINARY_RANGE = UINT_RANGE STRING_RANGE = UINT_RANGE STATEMENT = NAME S ":=" S DEFS S ";" ; TYPE must be defined. PROPERTIES must only use DEF and RANGE. DTYPE = NAME S ":=" S TYPE S (PROPERTIES S *1";")/";"
define elements { EBML := 1a45dfa3 container [ card:+; ] { EBMLVersion := 4286 uint [ def:1; ] EBMLReadVersion := 42f7 uint [ def:1; ] EBMLMaxIDLength := 42f2 uint [ def:4; ] EBMLMaxSizeLength := 42f3 uint [ def:8; ] DocType := 4282 string [ range:32..126; ] DocTypeVersion := 4287 uint [ def:1; ] DocTypeReadVersion := 4285 uint [ def:1; ] } CRC32 := c3 container [ level:1..; card:*; ] { %children; CRC32Value := 42fe binary [ size:4; ] } Void := ec binary [ level:1..; card:*; ] }
declare header { DocType := "matroska"; EBMLVersion := 1; } define types { bool := uint [ range:0..1; ] ascii := string [ range:32..126; ] } define elements { Segment := 18538067 container [ card:*; ] { // Meta Seek Information SeekHead := 114d9b74 container [ card:*; ] { Seek := 4dbb container [ card:*; ] { SeekID := 53ab binary; SeekPosition := 53ac uint; } } // Segment Information Info := 1549a966 container [ card:*; ] { SegmentUID := 73a4 binary; SegmentFilename := 7384 string; PrevUID := 3cb923 binary; PrevFilename := 3c83ab string; NextUID := 3eb923 binary; NextFilename := 3e83bb string; TimecodeScale := 2ad7b1 uint [ def:1000000; ] Duration := 4489 float [ range:>0.0; ] DateUTC := 4461 date; Title := 7ba9 string; MuxingApp := 4d80 string; WritingApp := 5741 string; } // Cluster Cluster := 1f43b675 container [ card:*; ] { Timecode := e7 uint; Position := a7 uint; PrevSize := ab uint; BlockGroup := a0 container [ card:*; ] { Block := a1 binary; BlockVirtual := a2 binary; BlockAdditions := 75a1 container { BlockMore := a6 container [ card:*; ] { BlockAddID := ee uint [ range:1..; ] BlockAdditional := a5 binary; } } BlockDuration := 9b uint [ def:TrackDuration; ]; ReferencePriority := fa uint; ReferenceBlock := fb int [ card:*; ] ReferenceVirtual := fd int; CodecState := a4 binary; Slices := 8e container [ card:*; ] { TimeSlice := e8 container [ card:*; ] { LaceNumber := cc uint [ def:0; ] FrameNumber := cd uint [ def:0; ] BlockAdditionID := cb uint [ def:0; ] Delay := ce uint [ def:0; ] Duration := cf uint [ def:TrackDuration; ]; } } } } // Track Tracks := 1654ae6b container [ card:*; ] { TrackEntry := ae container [ card:*; ] { TrackNumber := d7 uint [ range:1..; ] TrackUID := 73c5 uint [ range:1..; ] TrackType := 83 uint [ range:1..254; ] FlagEnabled := b9 uint [ range:0..1; def:1; ] FlagDefault := 88 uint [ range:0..1; def:1; ] FlagLacing := 9c uint [ range:0..1; def:1; ] MinCache := 6de7 uint [ def:0; ] MaxCache := 6df8 uint; DefaultDuration := 23e383 uint [ range:1..; ] TrackTimecodeScale := 23314f float [ range:>0.0; def:1.0; ] Name := 536e string; Language := 22b59c string [ def:"eng"; range:32..126; ] CodecID := 86 string [ range:32..126; ]; CodecPrivate := 63a2 binary; CodecName := 258688 string; CodecSettings := 3a9697 string; CodecInfoURL := 3b4040 string [ card:*; range:32..126; ] CodecDownloadURL := 26b240 string [ card:*; range:32..126; ] CodecDecodeAll := aa uint [ range:0..1; def:1; ] TrackOverlay := 6fab uint; // Video Video := e0 container { FlagInterlaced := 9a uint [ range:0..1; def:0; ] StereoMode := 53b8 uint [ range:0..3; def:0; ] PixelWidth := b0 uint [ range:1..; ] PixelHeight := ba uint [ range:1..; ] DisplayWidth := 54b0 uint [ def:PixelWidth; ] DisplayHeight := 54ba uint [ def:PixelHeight; ] DisplayUnit := 54b2 uint [ def:0; ] AspectRatioType := 54b3 uint [ def:0; ] ColourSpace := 2eb524 binary; GammaValue := 2fb523 float [ range:>0.0; ] } // Audio Audio := e1 container { SamplingFrequency := b5 float [ range:>0.0; def:8000.0; ] OutputSamplingFrequency := 78b5 float [ range:>0.0; def:8000.0; ] Channels := 94 uint [ range:1..; def:1; ] ChannelPositions := 7d7b binary; BitDepth := 6264 uint [ range:1..; ] } // Content Encoding ContentEncodings := 6d80 container { ContentEncoding := 6240 container [ card:*; ] { ContentEncodingOrder := 5031 uint [ def:0; ] ContentEncodingScope := 5032 uint [ range:1..; def:1; ] ContentEncodingType := 5033 uint; ContentCompression := 5034 container { ContentCompAlgo := 4254 uint [ def:0; ] ContentCompSettings := 4255 binary; } ContentEncryption := 5035 container { ContentEncAlgo := 47e1 uint [ def:0; ] ContentEncKeyID := 47e2 binary; ContentSignature := 47e3 binary; ContentSigKeyID := 47e4 binary; ContentSigAlgo := 47e5 uint; ContentSigHashAlgo := 47e6 uint; } } } } } // Cueing Data Cues := 1c53bb6b container { CuePoint := bb container [ card:*; ] { CueTime := b3 uint; CueTrackPositions := b7 container [ card:*; ] { CueTrack := f7 uint [ range:1..; ] CueClusterPosition := f1 uint; CueBlockNumber := 5378 uint [ range:1..; def:1; ] CueCodecState := ea uint [ def:0; ] CueReference := db container [ card:*; ] { CueRefTime := 96 uint; CueRefCluster := 97 uint; CueRefNumber := 535f uint [ range:1..; def:1; ] CueRefCodecState := eb uint [ def:0; ] } } } } // Attachment Attachments := 1941a469 container { AttachedFile := 61a7 container [ card:*; ] { FileDescription := 467e string; FileName := 466e string; FileMimeType := 4660 string [ range:32..126; ] FileData := 465c binary; FileUID := 46ae uint; } } // Chapters Chapters := 1043a770 container { EditionEntry := 45b9 container [ card:*; ] { ChapterAtom := b6 container [ card:*; ] { ChapterUID := 73c4 uint [ range:1..; ] ChapterTimeStart := 91 uint; ChapterTimeEnd := 92 uint; ChapterFlagHidden := 98 uint [ range:0..1; def:0; ] ChapterFlagEnabled := 4598 uint [ range:0..1; def:0; ] ChapterTrack := 8f container { ChapterTrackNumber := 89 uint [ card:*; range:0..1; ] ChapterDisplay := 80 container [ card:*; ] { ChapString := 85 string; ChapLanguage := 437c string [ card:*; def:"eng"; range:32..126; ] ChapCountry := 437e string [ card:*; range:32..126; ] } } } } } // Tagging Tags := 1254c367 container [ card:*; ] { Tag := 7373 container [ card:*; ] { Targets := 63c0 container { TrackUID := 63c5 uint [ card:*; def:0; ] ChapterUID := 63c4 uint [ card:*; def:0; ] AttachmentUID := 63c6 uint [ card:*; def:0; ] } SimpleTag := 67c8 container [ card:*; ] { TagName := 45a3 string; TagString := 4487 string; TagBinary := 4485 binary; } } } } }