webm EBML文件头分析

EBML element

分析webm格式,主要是对ebml element的理解,EBML是类似于XML那样的层次化结构,每一个元素都有它的ID、值,在二进制存储中每个元素的排列是 ID,长度,值

Element IDs (also called EBML IDs) are outlined as follows, beginning with the ID itself, followed by the Data Size, and then the non-interpreted Binary itself:

  • Element ID coded with an UTF-8 like system :
    bits, big-endian
    1xxx xxxx                                  - Class A IDs (2^7 -1 possible values) (base 0x8X)
    01xx xxxx  xxxx xxxx                       - Class B IDs (2^14-1 possible values) (base 0x4X 0xXX)
    001x xxxx  xxxx xxxx  xxxx xxxx            - Class C IDs (2^21-1 possible values) (base 0x2X 0xXX 0xXX)
    0001 xxxx  xxxx xxxx  xxxx xxxx  xxxx xxxx - Class D IDs (2^28-1 possible values) (base 0x1X 0xXX 0xXX 0xXX)
    

Data size, in octets, is also coded with an UTF-8 like system :

bits, big-endian
1xxx xxxx                                                                              - value 0 to  2^7-2
01xx xxxx  xxxx xxxx                                                                   - value 0 to 2^14-2
001x xxxx  xxxx xxxx  xxxx xxxx                                                        - value 0 to 2^21-2
0001 xxxx  xxxx xxxx  xxxx xxxx  xxxx xxxx                                             - value 0 to 2^28-2
0000 1xxx  xxxx xxxx  xxxx xxxx  xxxx xxxx  xxxx xxxx                                  - value 0 to 2^35-2
0000 01xx  xxxx xxxx  xxxx xxxx  xxxx xxxx  xxxx xxxx  xxxx xxxx                       - value 0 to 2^42-2
0000 001x  xxxx xxxx  xxxx xxxx  xxxx xxxx  xxxx xxxx  xxxx xxxx  xxxx xxxx            - value 0 to 2^49-2
0000 0001  xxxx xxxx  xxxx xxxx  xxxx xxxx  xxxx xxxx  xxxx xxxx  xxxx xxxx  xxxx xxxx - value 0 to 2^56-2

去掉001这样的前缀,xxx就是实际需要的element id, element data size的值。

Data

element id, element data size 都是以001这样开头,element data直接跟在element data size,没有了前面的001这样的前缀。

android 中的 EBML header的ID( external/libvpx/libmkv/EbmlIDs.h):

EBML = 0x1A45DFA3,          
EBMLVersion = 0x4286,       
EBMLReadVersion = 0x42F7,   
EBMLMaxIDLength = 0x42F2,   
EBMLMaxSizeLength = 0x42F3, 
DocType = 0x4282,           
DocTypeVersion = 0x4287,    
DocTypeReadVersion = 0x4285,

//segment            
Segment = 0x18538067,


所以判断一个文件是否webm文件需要满足两个主要条件:

  • 是否有EBML文件头0x1A45DFA3
  • doctype是不是webm

分析webm文件头

这是通过ghex程序拷贝的一个文件的ebml文件头信息(ghex打开文件后,可以通过save as菜单把hex保存为html):

1a 45 df a3 01 00 00 00 00 00 00 1f 42 86 81 01 42 f7
81 01 42 f2 81 04 42 f3 81 08 42 82 84 77 65 62 6d 42
87 81 02 42 85 81 02 18 53 80 67 01 00 00 00 00 18 ab

这个文件的EBML header可以这样理解:

Element ID:1a 45 df a3
Element data size : 01    [0000 0001, 8个字节]
Element data: 00 00 00 00 00 00 1f     [十进制是31,表示了后面所有Element总长度(字节),所以对于EBML header 的level 0,data的内容就是header中sub element的总字节数]


以42 82为例分析doctype:

Element ID:42 82 
Element data size : 84   [84二进制就是1000 0100,去掉1,后面就是000 0100,十进制是4,表示后面的数据占四个字节]
Element data: 77 65 62 6d   [对应的ascii字符就是w e b m]


gstreamer中gsttypefindfunctions.c 中 EBML 文件头解析的部分代码如下:

/* EBML typefind helper */                                                      
static gboolean                                                                 
ebml_check_header (GstTypeFind * tf, const gchar * doctype, int doctype_len)    
{                                                                               
  /* 4 bytes for EBML ID, 1 byte for header length identifier */                
  guint8 *data = gst_type_find_peek (tf, 0, 4 + 1);                             
  gint len_mask = 0x80, size = 1, n = 1, total;                                 
                                                                                
  if (!data)                                                                    
    return FALSE;                                                               
                                                                                
  /* ebml header? */                                                            
  if (data[0] != 0x1A || data[1] != 0x45 || data[2] != 0xDF || data[3] != 0xA3) 
    return FALSE;                                                               
                                                                                
  /* length of header */                                                        
  total = data[4];  
  /*
   * len_mask binary: 1000 0000, while循环 total & len_mask 就可计算出前面0的个数, 
   * 碰到1结束循环,size的值刚好就是ebml head element的字节数。                                                         
   */                                                            
  while (size <= 8 && !(total & len_mask)) {                                    
    size++;                                                                     
    len_mask >>= 1;                                                             
  }                                                                             
  if (size > 8) /* 得出ebml header(level 0) data 的字节数 */                                                               
    return FALSE;
                                                               
  total &= (len_mask - 1);                                                      
  while (n < size)                                                              
    total = (total << 8) | data[4 + n++];                                       
                                                                                
  /* get new data for full header, 4 bytes for EBML ID,                         
   * EBML length tag and the actual header */                                   
  data = gst_type_find_peek (tf, 0, 4 + size + total);                          
  if (!data)                                                                    
    return FALSE;                                                               
                                                                                
  /* only check doctype if asked to do so */                                    
  if (doctype == NULL || doctype_len == 0)                                      
    return TRUE;                                                                
                                                                                
  /* the header must contain the doctype. For now, we don't parse the           
   * whole header but simply check for the availability of that array           
   * of characters inside the header. Not fully fool-proof, but good            
   * enough. */                                                                 
  for (n = 4 + size; n <= 4 + size + total - doctype_len; n++)                  
    if (!memcmp (&data[n], doctype, doctype_len))                               
      return TRUE;                                                              
                                                                                
  return FALSE;                                                                 
} 

调用ebml_check_header的时候指定参数doctype为"matroska", "webm"即可。

static void                                                                 
matroska_type_find (GstTypeFind * tf, gpointer ununsed)                     
{                                                                           
  if (ebml_check_header (tf, "matroska", 8))                                
    gst_type_find_suggest (tf, GST_TYPE_FIND_MAXIMUM, MATROSKA_CAPS);       
  else if (ebml_check_header (tf, NULL, 0))                                 
    gst_type_find_suggest (tf, GST_TYPE_FIND_LIKELY, MATROSKA_CAPS);        
}                                                                           
                                                                            


参考:

多媒体封装格式详解---MKV【1】【2】【3】

http://blog.csdn.net/tx3344/article/details/8162656
http://blog.csdn.net/tx3344/article/details/8176288
http://blog.csdn.net/tx3344/article/details/8203260

MKV的EBML格式
http://tigersoldier.is-programmer.com/2008/6/30/ebml-in-mkv.4052.html


MKV文件格式
http://blog.chinaunix.net/uid-12845622-id-311943.html


Matroska文件解析之SimpleBlock
http://www.cnblogs.com/tangdoudou/archive/2012/05/14/2499063.html


工具MKVtoolnix:
http://www.cinker.com/2009/01/13/mkv-movie-split-merge-mkvtoolnix/

mkv ebml官方文档:
http://www.matroska.org/technical/specs/index.html

你可能感兴趣的:(webm EBML文件头分析)