【HBase十】HBase存储文件HFile剖析

1. 首先看看HBase中存储的文件内容

执行如下命令添加测试数据:

create 'table3', 'colfam1', { SPLITS => ['row-300', 'row-500', 'row-700' , 'row-900'] }

 

for i in '0'..'9' do for j in '0'..'9' do for k in '0'..'9' do put 'table3', "row-#{i}#{j}#{k}", "colfam1:#{j}#{k}", "#{j}#{k}" end end end

 

将数据从MemStore刷到磁盘中

flush 'table3'

 

再次执行一次:

for i in '0'..'9' do for j in '0'..'9' do for k in '0'..'9' do put 'table3', "row-#{i}#{j}#{k}", "colfam1:#{j}#{k}", "#{j}#{k}" end end end

 

 

然后在hbase命令行中执行如下命令

[hadoop@hadoop bin]$ ./hbase org.apache.hadoop.hbase.io.hfile.HFile -f /hbase/data/default/table3/1fa2e49c7404d3cd39afc39a99cc1c26/colfam1/0f6fc234c3014b6e9d84d3cae065d1b4 -v -m -p

 其中:

1fa2e49c7404d3cd39afc39a99cc1c26表示region名字,0f6fc234c3014b6e9d84d3cae065d1b4表示一个HFile的名字

打印结果:

Scanning -> /hbase/data/default/table3/1fa2e49c7404d3cd39afc39a99cc1c26/colfam1/0f6fc234c3014b6e9d84d3cae065d1b4
2015-04-09 22:53:01,918 INFO  [main] hfile.CacheConfig: CacheConfig:disabled

///注释:K:和V:表示HFile中的KV数据对,从下面的输出中可以看到,每个K都占用比较多的字节数,它是由rowKey,column(family:columnName)...组成
///The actual data stored as serialized KeyValue instances
K: row-500/colfam1:00/1428632364152/Put/vlen=2/seqid=5 V: 00
K: row-501/colfam1:01/1428632364177/Put/vlen=2/seqid=7 V: 01
K: row-502/colfam1:02/1428632364204/Put/vlen=2/seqid=9 V: 02
K: row-503/colfam1:03/1428632364287/Put/vlen=2/seqid=11 V: 03
K: row-504/colfam1:04/1428632364309/Put/vlen=2/seqid=13 V: 04
K: row-505/colfam1:05/1428632364318/Put/vlen=2/seqid=15 V: 05
K: row-506/colfam1:06/1428632364330/Put/vlen=2/seqid=17 V: 06
K: row-507/colfam1:07/1428632364351/Put/vlen=2/seqid=19 V: 07
K: row-508/colfam1:08/1428632364361/Put/vlen=2/seqid=21 V: 08
K: row-509/colfam1:09/1428632364381/Put/vlen=2/seqid=23 V: 09
K: row-510/colfam1:10/1428632364400/Put/vlen=2/seqid=25 V: 10
K: row-511/colfam1:11/1428632364411/Put/vlen=2/seqid=27 V: 11
K: row-512/colfam1:12/1428632364426/Put/vlen=2/seqid=29 V: 12
K: row-513/colfam1:13/1428632364440/Put/vlen=2/seqid=31 V: 13
K: row-514/colfam1:14/1428632364474/Put/vlen=2/seqid=33 V: 14
K: row-515/colfam1:15/1428632364496/Put/vlen=2/seqid=35 V: 15
K: row-516/colfam1:16/1428632364521/Put/vlen=2/seqid=37 V: 16
K: row-517/colfam1:17/1428632364528/Put/vlen=2/seqid=39 V: 17
K: row-518/colfam1:18/1428632364539/Put/vlen=2/seqid=41 V: 18
K: row-519/colfam1:19/1428632364551/Put/vlen=2/seqid=43 V: 19
K: row-520/colfam1:20/1428632364561/Put/vlen=2/seqid=45 V: 20
K: row-521/colfam1:21/1428632364574/Put/vlen=2/seqid=47 V: 21
K: row-522/colfam1:22/1428632364589/Put/vlen=2/seqid=49 V: 22
K: row-523/colfam1:23/1428632364602/Put/vlen=2/seqid=51 V: 23
K: row-524/colfam1:24/1428632364617/Put/vlen=2/seqid=53 V: 24
K: row-525/colfam1:25/1428632364634/Put/vlen=2/seqid=55 V: 25
K: row-526/colfam1:26/1428632364647/Put/vlen=2/seqid=57 V: 26
K: row-527/colfam1:27/1428632364653/Put/vlen=2/seqid=59 V: 27
K: row-528/colfam1:28/1428632364665/Put/vlen=2/seqid=61 V: 28
K: row-529/colfam1:29/1428632364734/Put/vlen=2/seqid=63 V: 29
K: row-530/colfam1:30/1428632364746/Put/vlen=2/seqid=65 V: 30
K: row-531/colfam1:31/1428632364760/Put/vlen=2/seqid=67 V: 31
K: row-532/colfam1:32/1428632364777/Put/vlen=2/seqid=69 V: 32
K: row-533/colfam1:33/1428632364819/Put/vlen=2/seqid=71 V: 33
K: row-534/colfam1:34/1428632364831/Put/vlen=2/seqid=73 V: 34
K: row-535/colfam1:35/1428632364837/Put/vlen=2/seqid=75 V: 35
K: row-536/colfam1:36/1428632364846/Put/vlen=2/seqid=77 V: 36
K: row-537/colfam1:37/1428632364852/Put/vlen=2/seqid=79 V: 37
K: row-538/colfam1:38/1428632364861/Put/vlen=2/seqid=81 V: 38
K: row-539/colfam1:39/1428632364872/Put/vlen=2/seqid=83 V: 39
K: row-540/colfam1:40/1428632364880/Put/vlen=2/seqid=85 V: 40
K: row-541/colfam1:41/1428632364886/Put/vlen=2/seqid=87 V: 41
K: row-542/colfam1:42/1428632364897/Put/vlen=2/seqid=89 V: 42
K: row-543/colfam1:43/1428632364909/Put/vlen=2/seqid=91 V: 43
K: row-544/colfam1:44/1428632364924/Put/vlen=2/seqid=93 V: 44
K: row-545/colfam1:45/1428632364937/Put/vlen=2/seqid=95 V: 45
K: row-546/colfam1:46/1428632364946/Put/vlen=2/seqid=97 V: 46
K: row-547/colfam1:47/1428632364955/Put/vlen=2/seqid=99 V: 47
K: row-548/colfam1:48/1428632364964/Put/vlen=2/seqid=101 V: 48
K: row-549/colfam1:49/1428632364976/Put/vlen=2/seqid=103 V: 49
K: row-550/colfam1:50/1428632364982/Put/vlen=2/seqid=105 V: 50
K: row-551/colfam1:51/1428632364992/Put/vlen=2/seqid=107 V: 51
K: row-552/colfam1:52/1428632365001/Put/vlen=2/seqid=109 V: 52
K: row-553/colfam1:53/1428632365011/Put/vlen=2/seqid=111 V: 53
K: row-554/colfam1:54/1428632365020/Put/vlen=2/seqid=113 V: 54
K: row-555/colfam1:55/1428632365035/Put/vlen=2/seqid=115 V: 55
K: row-556/colfam1:56/1428632365048/Put/vlen=2/seqid=117 V: 56
K: row-557/colfam1:57/1428632365056/Put/vlen=2/seqid=119 V: 57
K: row-558/colfam1:58/1428632365064/Put/vlen=2/seqid=121 V: 58
K: row-559/colfam1:59/1428632365080/Put/vlen=2/seqid=123 V: 59
K: row-560/colfam1:60/1428632365095/Put/vlen=2/seqid=125 V: 60
K: row-561/colfam1:61/1428632365111/Put/vlen=2/seqid=127 V: 61
K: row-562/colfam1:62/1428632365123/Put/vlen=2/seqid=129 V: 62
K: row-563/colfam1:63/1428632365133/Put/vlen=2/seqid=131 V: 63
K: row-564/colfam1:64/1428632365142/Put/vlen=2/seqid=133 V: 64
K: row-565/colfam1:65/1428632365151/Put/vlen=2/seqid=135 V: 65
K: row-566/colfam1:66/1428632365159/Put/vlen=2/seqid=137 V: 66
K: row-567/colfam1:67/1428632365169/Put/vlen=2/seqid=139 V: 67
K: row-568/colfam1:68/1428632365179/Put/vlen=2/seqid=141 V: 68
K: row-569/colfam1:69/1428632365192/Put/vlen=2/seqid=143 V: 69
K: row-570/colfam1:70/1428632365200/Put/vlen=2/seqid=145 V: 70
K: row-571/colfam1:71/1428632365209/Put/vlen=2/seqid=147 V: 71
K: row-572/colfam1:72/1428632365217/Put/vlen=2/seqid=149 V: 72
K: row-573/colfam1:73/1428632365226/Put/vlen=2/seqid=151 V: 73
K: row-574/colfam1:74/1428632365237/Put/vlen=2/seqid=153 V: 74
K: row-575/colfam1:75/1428632365245/Put/vlen=2/seqid=155 V: 75
K: row-576/colfam1:76/1428632365253/Put/vlen=2/seqid=157 V: 76
K: row-577/colfam1:77/1428632365265/Put/vlen=2/seqid=159 V: 77
K: row-578/colfam1:78/1428632365279/Put/vlen=2/seqid=161 V: 78
K: row-579/colfam1:79/1428632365287/Put/vlen=2/seqid=163 V: 79
K: row-580/colfam1:80/1428632365294/Put/vlen=2/seqid=165 V: 80
K: row-581/colfam1:81/1428632365305/Put/vlen=2/seqid=167 V: 81
K: row-582/colfam1:82/1428632365314/Put/vlen=2/seqid=169 V: 82
K: row-583/colfam1:83/1428632365321/Put/vlen=2/seqid=171 V: 83
K: row-584/colfam1:84/1428632365343/Put/vlen=2/seqid=173 V: 84
K: row-585/colfam1:85/1428632365352/Put/vlen=2/seqid=175 V: 85
K: row-586/colfam1:86/1428632365375/Put/vlen=2/seqid=177 V: 86
K: row-587/colfam1:87/1428632365535/Put/vlen=2/seqid=179 V: 87
K: row-588/colfam1:88/1428632365560/Put/vlen=2/seqid=181 V: 88
K: row-589/colfam1:89/1428632365569/Put/vlen=2/seqid=183 V: 89
K: row-590/colfam1:90/1428632365582/Put/vlen=2/seqid=185 V: 90
K: row-591/colfam1:91/1428632365594/Put/vlen=2/seqid=187 V: 91
K: row-592/colfam1:92/1428632365620/Put/vlen=2/seqid=189 V: 92
K: row-593/colfam1:93/1428632365633/Put/vlen=2/seqid=191 V: 93
K: row-594/colfam1:94/1428632365642/Put/vlen=2/seqid=193 V: 94
K: row-595/colfam1:95/1428632365651/Put/vlen=2/seqid=195 V: 95
K: row-596/colfam1:96/1428632365671/Put/vlen=2/seqid=197 V: 96
K: row-597/colfam1:97/1428632365679/Put/vlen=2/seqid=199 V: 97
K: row-598/colfam1:98/1428632365684/Put/vlen=2/seqid=201 V: 98
K: row-599/colfam1:99/1428632365689/Put/vlen=2/seqid=203 V: 99
K: row-600/colfam1:00/1428632365694/Put/vlen=2/seqid=205 V: 00
K: row-601/colfam1:01/1428632365702/Put/vlen=2/seqid=207 V: 01
K: row-602/colfam1:02/1428632365709/Put/vlen=2/seqid=209 V: 02
K: row-603/colfam1:03/1428632365717/Put/vlen=2/seqid=211 V: 03
K: row-604/colfam1:04/1428632365722/Put/vlen=2/seqid=213 V: 04
K: row-605/colfam1:05/1428632365729/Put/vlen=2/seqid=215 V: 05
K: row-606/colfam1:06/1428632365752/Put/vlen=2/seqid=217 V: 06
K: row-607/colfam1:07/1428632365758/Put/vlen=2/seqid=219 V: 07
K: row-608/colfam1:08/1428632365765/Put/vlen=2/seqid=221 V: 08
K: row-609/colfam1:09/1428632365773/Put/vlen=2/seqid=223 V: 09
K: row-610/colfam1:10/1428632365778/Put/vlen=2/seqid=225 V: 10
K: row-611/colfam1:11/1428632365785/Put/vlen=2/seqid=227 V: 11
K: row-612/colfam1:12/1428632365791/Put/vlen=2/seqid=229 V: 12
K: row-613/colfam1:13/1428632365798/Put/vlen=2/seqid=231 V: 13
K: row-614/colfam1:14/1428632365803/Put/vlen=2/seqid=233 V: 14
K: row-615/colfam1:15/1428632365811/Put/vlen=2/seqid=235 V: 15
K: row-616/colfam1:16/1428632365820/Put/vlen=2/seqid=237 V: 16
K: row-617/colfam1:17/1428632365834/Put/vlen=2/seqid=239 V: 17
K: row-618/colfam1:18/1428632365840/Put/vlen=2/seqid=241 V: 18
K: row-619/colfam1:19/1428632365850/Put/vlen=2/seqid=243 V: 19
K: row-620/colfam1:20/1428632365856/Put/vlen=2/seqid=245 V: 20
K: row-621/colfam1:21/1428632365864/Put/vlen=2/seqid=247 V: 21
K: row-622/colfam1:22/1428632365874/Put/vlen=2/seqid=249 V: 22
K: row-623/colfam1:23/1428632365882/Put/vlen=2/seqid=251 V: 23
K: row-624/colfam1:24/1428632365896/Put/vlen=2/seqid=253 V: 24
K: row-625/colfam1:25/1428632365903/Put/vlen=2/seqid=255 V: 25
K: row-626/colfam1:26/1428632365908/Put/vlen=2/seqid=257 V: 26
K: row-627/colfam1:27/1428632365917/Put/vlen=2/seqid=259 V: 27
K: row-628/colfam1:28/1428632365928/Put/vlen=2/seqid=261 V: 28
K: row-629/colfam1:29/1428632365934/Put/vlen=2/seqid=263 V: 29
K: row-630/colfam1:30/1428632365940/Put/vlen=2/seqid=265 V: 30
K: row-631/colfam1:31/1428632365945/Put/vlen=2/seqid=267 V: 31
K: row-632/colfam1:32/1428632365957/Put/vlen=2/seqid=269 V: 32
K: row-633/colfam1:33/1428632365967/Put/vlen=2/seqid=271 V: 33
K: row-634/colfam1:34/1428632365982/Put/vlen=2/seqid=273 V: 34
K: row-635/colfam1:35/1428632365999/Put/vlen=2/seqid=275 V: 35
K: row-636/colfam1:36/1428632366004/Put/vlen=2/seqid=277 V: 36
K: row-637/colfam1:37/1428632366020/Put/vlen=2/seqid=279 V: 37
K: row-638/colfam1:38/1428632366031/Put/vlen=2/seqid=281 V: 38
K: row-639/colfam1:39/1428632366038/Put/vlen=2/seqid=283 V: 39
K: row-640/colfam1:40/1428632366048/Put/vlen=2/seqid=285 V: 40
K: row-641/colfam1:41/1428632366057/Put/vlen=2/seqid=287 V: 41
K: row-642/colfam1:42/1428632366240/Put/vlen=2/seqid=289 V: 42
K: row-643/colfam1:43/1428632366249/Put/vlen=2/seqid=291 V: 43
K: row-644/colfam1:44/1428632366256/Put/vlen=2/seqid=293 V: 44
K: row-645/colfam1:45/1428632366264/Put/vlen=2/seqid=295 V: 45
K: row-646/colfam1:46/1428632366270/Put/vlen=2/seqid=297 V: 46
K: row-647/colfam1:47/1428632366276/Put/vlen=2/seqid=299 V: 47
K: row-648/colfam1:48/1428632366284/Put/vlen=2/seqid=301 V: 48
K: row-649/colfam1:49/1428632366290/Put/vlen=2/seqid=303 V: 49
K: row-650/colfam1:50/1428632366300/Put/vlen=2/seqid=305 V: 50
K: row-651/colfam1:51/1428632366305/Put/vlen=2/seqid=307 V: 51
K: row-652/colfam1:52/1428632366313/Put/vlen=2/seqid=309 V: 52
K: row-653/colfam1:53/1428632366321/Put/vlen=2/seqid=311 V: 53
K: row-654/colfam1:54/1428632366330/Put/vlen=2/seqid=313 V: 54
K: row-655/colfam1:55/1428632366337/Put/vlen=2/seqid=315 V: 55
K: row-656/colfam1:56/1428632366343/Put/vlen=2/seqid=317 V: 56
K: row-657/colfam1:57/1428632366350/Put/vlen=2/seqid=319 V: 57
K: row-658/colfam1:58/1428632366363/Put/vlen=2/seqid=321 V: 58
K: row-659/colfam1:59/1428632366370/Put/vlen=2/seqid=323 V: 59
K: row-660/colfam1:60/1428632366384/Put/vlen=2/seqid=325 V: 60
K: row-661/colfam1:61/1428632366392/Put/vlen=2/seqid=327 V: 61
K: row-662/colfam1:62/1428632366397/Put/vlen=2/seqid=329 V: 62
K: row-663/colfam1:63/1428632366403/Put/vlen=2/seqid=331 V: 63
K: row-664/colfam1:64/1428632366410/Put/vlen=2/seqid=333 V: 64
K: row-665/colfam1:65/1428632366421/Put/vlen=2/seqid=335 V: 65
K: row-666/colfam1:66/1428632366430/Put/vlen=2/seqid=337 V: 66
K: row-667/colfam1:67/1428632366437/Put/vlen=2/seqid=339 V: 67
K: row-668/colfam1:68/1428632366444/Put/vlen=2/seqid=341 V: 68
K: row-669/colfam1:69/1428632366461/Put/vlen=2/seqid=343 V: 69
K: row-670/colfam1:70/1428632366477/Put/vlen=2/seqid=345 V: 70
K: row-671/colfam1:71/1428632366487/Put/vlen=2/seqid=347 V: 71
K: row-672/colfam1:72/1428632366498/Put/vlen=2/seqid=349 V: 72
K: row-673/colfam1:73/1428632366507/Put/vlen=2/seqid=351 V: 73
K: row-674/colfam1:74/1428632366520/Put/vlen=2/seqid=353 V: 74
K: row-675/colfam1:75/1428632366530/Put/vlen=2/seqid=355 V: 75
K: row-676/colfam1:76/1428632366542/Put/vlen=2/seqid=357 V: 76
K: row-677/colfam1:77/1428632366555/Put/vlen=2/seqid=359 V: 77
K: row-678/colfam1:78/1428632366578/Put/vlen=2/seqid=361 V: 78
K: row-679/colfam1:79/1428632366588/Put/vlen=2/seqid=363 V: 79
K: row-680/colfam1:80/1428632366596/Put/vlen=2/seqid=365 V: 80
K: row-681/colfam1:81/1428632366604/Put/vlen=2/seqid=367 V: 81
K: row-682/colfam1:82/1428632366617/Put/vlen=2/seqid=369 V: 82
K: row-683/colfam1:83/1428632366629/Put/vlen=2/seqid=371 V: 83
K: row-684/colfam1:84/1428632366640/Put/vlen=2/seqid=373 V: 84
K: row-685/colfam1:85/1428632366649/Put/vlen=2/seqid=375 V: 85
K: row-686/colfam1:86/1428632366658/Put/vlen=2/seqid=377 V: 86
K: row-687/colfam1:87/1428632366664/Put/vlen=2/seqid=379 V: 87
K: row-688/colfam1:88/1428632366673/Put/vlen=2/seqid=381 V: 88
K: row-689/colfam1:89/1428632366680/Put/vlen=2/seqid=383 V: 89
K: row-690/colfam1:90/1428632366686/Put/vlen=2/seqid=385 V: 90
K: row-691/colfam1:91/1428632366693/Put/vlen=2/seqid=387 V: 91
K: row-692/colfam1:92/1428632366701/Put/vlen=2/seqid=389 V: 92
K: row-693/colfam1:93/1428632366857/Put/vlen=2/seqid=391 V: 93
K: row-694/colfam1:94/1428632366868/Put/vlen=2/seqid=393 V: 94
K: row-695/colfam1:95/1428632366873/Put/vlen=2/seqid=395 V: 95
K: row-696/colfam1:96/1428632366881/Put/vlen=2/seqid=397 V: 96
K: row-697/colfam1:97/1428632366890/Put/vlen=2/seqid=399 V: 97
K: row-698/colfam1:98/1428632366896/Put/vlen=2/seqid=401 V: 98
K: row-699/colfam1:99/1428632366902/Put/vlen=2/seqid=403 V: 99
Block index size as per heapsize: 400

///dumps the internal HFile.Reader properties
reader=/hbase/data/default/table3/1fa2e49c7404d3cd39afc39a99cc1c26/colfam1/0f6fc234c3014b6e9d84d3cae065d1b4,
    compression=none,
    cacheConf=CacheConfig:disabled,
    firstKey=row-500/colfam1:00/1428632364152/Put,
    lastKey=row-699/colfam1:99/1428632366902/Put,
    avgKeyLen=28,
    avgValueLen=2,
    entries=200,
    length=13581

///Trailer块信息
Trailer:
    fileinfoOffset=8857,
    loadOnOpenDataOffset=8742,
    dataIndexCount=1,
    metaIndexCount=0,
    totalUncomressedBytes=13483,
    entryCount=200,
    compressionCodec=NONE,
    uncompressedDataIndexSize=41,
    numDataIndexLevels=1,
    firstDataBlockOffset=0,
    lastDataBlockOffset=0,
    comparatorClassName=org.apache.hadoop.hbase.KeyValue$KeyComparator,
    encryptionKey=NONE,
    majorVersion=3,
    minorVersion=0

///FileInfo块信息
Fileinfo:
    BLOOM_FILTER_TYPE = ROW
    DELETE_FAMILY_COUNT = \x00\x00\x00\x00\x00\x00\x00\x00
    EARLIEST_PUT_TS = \x00\x00\x01L\xA1\x1F\xE4x
    KEY_VALUE_VERSION = \x00\x00\x00\x01
    LAST_BLOOM_KEY = row-699
    MAJOR_COMPACTION_KEY = \x00
    MAX_MEMSTORE_TS_KEY = \x00\x00\x00\x00\x00\x00\x01\x93
    MAX_SEQ_ID_KEY = 404
    TIMERANGE = 1428632364152....1428632366902
    hfile.AVG_KEY_LEN = 28
    hfile.AVG_VALUE_LEN = 2
    hfile.LASTKEY = \x00\x07row-699\x07colfam199\x00\x00\x01L\xA1\x1F\xEF6\x04
    hfile.MAX_TAGS_LEN = \x00\x00\x00\x00
    hfile.TAGS_COMPRESSED = \x00
Mid-key: \x00\x07row-500\x07colfam100\x00\x00\x01L\xA1\x1F\xE4x\x04
Bloom filter:
    BloomSize: 256
    No of Keys in bloom: 200
    Max Keys for bloom: 213
    Percentage filled: 94%
    Number of chunks: 1
    Comparator: RawBytesComparator
Delete Family Bloom filter:
    Not present
///查询到的数据KV总数
Scanned kv count -> 200

 

2. KeyValue的Format

在HFile中,KeyValue是一个字节数组,由如下信息组成

 
【HBase十】HBase存储文件HFile剖析_第1张图片
 

3. HFile数据结构

【HBase十】HBase存储文件HFile剖析_第2张图片
3.1 Trailer数据块
      Trailer是定长的,如图中所示,Trailer中有指针指向其他数据块的起始点,读取一个HFile时,会首先读取Trailer,然后DataBlock Index会被读取到内存中,这样当检索某个key时,不需要扫描整个HFile,而只需从内存中找到key所在的block,通过一次磁盘io将整个block读取到内存中,再找到需要的key。

 

3.2 File Info数据块
File Info数据块是定长的,记录了文件的一些Meta信息,例如:AVG_KEY_LEN, AVG_VALUE_LEN,LAST_KEY, COMPARATOR, MAX_SEQ_ID_KEY等。

 

3.3 Data Block
Data Block保存表中的数据,是HBase I/O的基本单元,为了提高效率,HRegionServer中有基于LRU的block cache机制。每个Data块的大小可以在创建一个table的时候通过参数指定,大号的block有利于顺序scan,小号block利于随机查询。

每个Data块除了开头的Magic以外就是一个个KeyValue对拼接而成,Magic内容就是一些随机数字,目的是防止数据损坏。每个块都有一个魔数

 

 

关于Data Block的块大小

Minimum block size. We recommend a setting of minimum block size between 8KB to 1MB for general usage. Larger block size is preferred if files are primarily for sequential
access. However, it would lead to inefficient random access (because there are more data to decompress). Smaller blocks are good for random access, but require more memory
to hold the block index, and may be slower to create (because we must flush the compressor stream at the conclusion of each data block, which leads to an FS I/O flush).
Further, due to the internal caching in Compression codec, the smallest possible block size would be around 20KB-30KB.

 

 

3.4 Meta Block段(可选的):
保存用户自定义的KeyValue对,可以被压缩。
Data Block Index段:
Data Block的索引,每条索引的key是被索引的block的第一条记录的key。The index blocks record the offsets of the data and meta blocks

 

4. HFile与HDFS Block的关系

HFile的块大小默认是64k,而HDFS的块大小默认是64M,因此,HDFS的块大小是HFile的块大小的1024倍,下图展现了232M

 
【HBase十】HBase存储文件HFile剖析_第3张图片
HFile中的块存放到HDFS的块中

 

5. HFile Compact

数据写入流程: Client写入 -> 存入MemStore,一直到MemStore满 -> Flush成一个StoreFile,StoreFile数目直至增长到一定阈值 -> 触发Compact合并操作 -> 多个StoreFile合并成一个StoreFile,同时进行版本合并和数据删除 -> 当StoreFiles Compact后,逐步形成越来越大的StoreFile -> 单个StoreFile大小超过一定阈值后,触发Split操作,把当前Region Split成2个Region,Region会下线,新Split出的2个孩子Region会被HMaster分配到相应的HRegionServer上,使得原先1个Region的压力得以分流到2个Region上。由此过程可知,HBase只是增加数据,有所得更新和删除操作,都是在Compact阶段做的,所以,用户写操作只需要进入到内存即可立即返回,从而保证I/O高性能。

 


【HBase十】HBase存储文件HFile剖析_第4张图片
 

 
【HBase十】HBase存储文件HFile剖析_第5张图片
 
【HBase十】HBase存储文件HFile剖析_第6张图片
 

 

 

 

 

 参考:http://blog.csdn.net/john_f_lau/article/details/18899311

你可能感兴趣的:(hbase)