1. 首先看看HBase中存储的文件内容
执行如下命令添加测试数据:
create 'table3', 'colfam1', { SPLITS => ['row-300', 'row-500', 'row-700' , 'row-900'] }
for i in '0'..'9' do for j in '0'..'9' do for k in '0'..'9' do put 'table3', "row-#{i}#{j}#{k}", "colfam1:#{j}#{k}", "#{j}#{k}" end end end
将数据从MemStore刷到磁盘中
flush 'table3'
再次执行一次:
for i in '0'..'9' do for j in '0'..'9' do for k in '0'..'9' do put 'table3', "row-#{i}#{j}#{k}", "colfam1:#{j}#{k}", "#{j}#{k}" end end end
然后在hbase命令行中执行如下命令
[hadoop@hadoop bin]$ ./hbase org.apache.hadoop.hbase.io.hfile.HFile -f /hbase/data/default/table3/1fa2e49c7404d3cd39afc39a99cc1c26/colfam1/0f6fc234c3014b6e9d84d3cae065d1b4 -v -m -p
其中:
1fa2e49c7404d3cd39afc39a99cc1c26表示region名字,0f6fc234c3014b6e9d84d3cae065d1b4表示一个HFile的名字
打印结果:
Scanning -> /hbase/data/default/table3/1fa2e49c7404d3cd39afc39a99cc1c26/colfam1/0f6fc234c3014b6e9d84d3cae065d1b4 2015-04-09 22:53:01,918 INFO [main] hfile.CacheConfig: CacheConfig:disabled ///注释:K:和V:表示HFile中的KV数据对,从下面的输出中可以看到,每个K都占用比较多的字节数,它是由rowKey,column(family:columnName)...组成 ///The actual data stored as serialized KeyValue instances K: row-500/colfam1:00/1428632364152/Put/vlen=2/seqid=5 V: 00 K: row-501/colfam1:01/1428632364177/Put/vlen=2/seqid=7 V: 01 K: row-502/colfam1:02/1428632364204/Put/vlen=2/seqid=9 V: 02 K: row-503/colfam1:03/1428632364287/Put/vlen=2/seqid=11 V: 03 K: row-504/colfam1:04/1428632364309/Put/vlen=2/seqid=13 V: 04 K: row-505/colfam1:05/1428632364318/Put/vlen=2/seqid=15 V: 05 K: row-506/colfam1:06/1428632364330/Put/vlen=2/seqid=17 V: 06 K: row-507/colfam1:07/1428632364351/Put/vlen=2/seqid=19 V: 07 K: row-508/colfam1:08/1428632364361/Put/vlen=2/seqid=21 V: 08 K: row-509/colfam1:09/1428632364381/Put/vlen=2/seqid=23 V: 09 K: row-510/colfam1:10/1428632364400/Put/vlen=2/seqid=25 V: 10 K: row-511/colfam1:11/1428632364411/Put/vlen=2/seqid=27 V: 11 K: row-512/colfam1:12/1428632364426/Put/vlen=2/seqid=29 V: 12 K: row-513/colfam1:13/1428632364440/Put/vlen=2/seqid=31 V: 13 K: row-514/colfam1:14/1428632364474/Put/vlen=2/seqid=33 V: 14 K: row-515/colfam1:15/1428632364496/Put/vlen=2/seqid=35 V: 15 K: row-516/colfam1:16/1428632364521/Put/vlen=2/seqid=37 V: 16 K: row-517/colfam1:17/1428632364528/Put/vlen=2/seqid=39 V: 17 K: row-518/colfam1:18/1428632364539/Put/vlen=2/seqid=41 V: 18 K: row-519/colfam1:19/1428632364551/Put/vlen=2/seqid=43 V: 19 K: row-520/colfam1:20/1428632364561/Put/vlen=2/seqid=45 V: 20 K: row-521/colfam1:21/1428632364574/Put/vlen=2/seqid=47 V: 21 K: row-522/colfam1:22/1428632364589/Put/vlen=2/seqid=49 V: 22 K: row-523/colfam1:23/1428632364602/Put/vlen=2/seqid=51 V: 23 K: row-524/colfam1:24/1428632364617/Put/vlen=2/seqid=53 V: 24 K: row-525/colfam1:25/1428632364634/Put/vlen=2/seqid=55 V: 25 K: row-526/colfam1:26/1428632364647/Put/vlen=2/seqid=57 V: 26 K: row-527/colfam1:27/1428632364653/Put/vlen=2/seqid=59 V: 27 K: row-528/colfam1:28/1428632364665/Put/vlen=2/seqid=61 V: 28 K: row-529/colfam1:29/1428632364734/Put/vlen=2/seqid=63 V: 29 K: row-530/colfam1:30/1428632364746/Put/vlen=2/seqid=65 V: 30 K: row-531/colfam1:31/1428632364760/Put/vlen=2/seqid=67 V: 31 K: row-532/colfam1:32/1428632364777/Put/vlen=2/seqid=69 V: 32 K: row-533/colfam1:33/1428632364819/Put/vlen=2/seqid=71 V: 33 K: row-534/colfam1:34/1428632364831/Put/vlen=2/seqid=73 V: 34 K: row-535/colfam1:35/1428632364837/Put/vlen=2/seqid=75 V: 35 K: row-536/colfam1:36/1428632364846/Put/vlen=2/seqid=77 V: 36 K: row-537/colfam1:37/1428632364852/Put/vlen=2/seqid=79 V: 37 K: row-538/colfam1:38/1428632364861/Put/vlen=2/seqid=81 V: 38 K: row-539/colfam1:39/1428632364872/Put/vlen=2/seqid=83 V: 39 K: row-540/colfam1:40/1428632364880/Put/vlen=2/seqid=85 V: 40 K: row-541/colfam1:41/1428632364886/Put/vlen=2/seqid=87 V: 41 K: row-542/colfam1:42/1428632364897/Put/vlen=2/seqid=89 V: 42 K: row-543/colfam1:43/1428632364909/Put/vlen=2/seqid=91 V: 43 K: row-544/colfam1:44/1428632364924/Put/vlen=2/seqid=93 V: 44 K: row-545/colfam1:45/1428632364937/Put/vlen=2/seqid=95 V: 45 K: row-546/colfam1:46/1428632364946/Put/vlen=2/seqid=97 V: 46 K: row-547/colfam1:47/1428632364955/Put/vlen=2/seqid=99 V: 47 K: row-548/colfam1:48/1428632364964/Put/vlen=2/seqid=101 V: 48 K: row-549/colfam1:49/1428632364976/Put/vlen=2/seqid=103 V: 49 K: row-550/colfam1:50/1428632364982/Put/vlen=2/seqid=105 V: 50 K: row-551/colfam1:51/1428632364992/Put/vlen=2/seqid=107 V: 51 K: row-552/colfam1:52/1428632365001/Put/vlen=2/seqid=109 V: 52 K: row-553/colfam1:53/1428632365011/Put/vlen=2/seqid=111 V: 53 K: row-554/colfam1:54/1428632365020/Put/vlen=2/seqid=113 V: 54 K: row-555/colfam1:55/1428632365035/Put/vlen=2/seqid=115 V: 55 K: row-556/colfam1:56/1428632365048/Put/vlen=2/seqid=117 V: 56 K: row-557/colfam1:57/1428632365056/Put/vlen=2/seqid=119 V: 57 K: row-558/colfam1:58/1428632365064/Put/vlen=2/seqid=121 V: 58 K: row-559/colfam1:59/1428632365080/Put/vlen=2/seqid=123 V: 59 K: row-560/colfam1:60/1428632365095/Put/vlen=2/seqid=125 V: 60 K: row-561/colfam1:61/1428632365111/Put/vlen=2/seqid=127 V: 61 K: row-562/colfam1:62/1428632365123/Put/vlen=2/seqid=129 V: 62 K: row-563/colfam1:63/1428632365133/Put/vlen=2/seqid=131 V: 63 K: row-564/colfam1:64/1428632365142/Put/vlen=2/seqid=133 V: 64 K: row-565/colfam1:65/1428632365151/Put/vlen=2/seqid=135 V: 65 K: row-566/colfam1:66/1428632365159/Put/vlen=2/seqid=137 V: 66 K: row-567/colfam1:67/1428632365169/Put/vlen=2/seqid=139 V: 67 K: row-568/colfam1:68/1428632365179/Put/vlen=2/seqid=141 V: 68 K: row-569/colfam1:69/1428632365192/Put/vlen=2/seqid=143 V: 69 K: row-570/colfam1:70/1428632365200/Put/vlen=2/seqid=145 V: 70 K: row-571/colfam1:71/1428632365209/Put/vlen=2/seqid=147 V: 71 K: row-572/colfam1:72/1428632365217/Put/vlen=2/seqid=149 V: 72 K: row-573/colfam1:73/1428632365226/Put/vlen=2/seqid=151 V: 73 K: row-574/colfam1:74/1428632365237/Put/vlen=2/seqid=153 V: 74 K: row-575/colfam1:75/1428632365245/Put/vlen=2/seqid=155 V: 75 K: row-576/colfam1:76/1428632365253/Put/vlen=2/seqid=157 V: 76 K: row-577/colfam1:77/1428632365265/Put/vlen=2/seqid=159 V: 77 K: row-578/colfam1:78/1428632365279/Put/vlen=2/seqid=161 V: 78 K: row-579/colfam1:79/1428632365287/Put/vlen=2/seqid=163 V: 79 K: row-580/colfam1:80/1428632365294/Put/vlen=2/seqid=165 V: 80 K: row-581/colfam1:81/1428632365305/Put/vlen=2/seqid=167 V: 81 K: row-582/colfam1:82/1428632365314/Put/vlen=2/seqid=169 V: 82 K: row-583/colfam1:83/1428632365321/Put/vlen=2/seqid=171 V: 83 K: row-584/colfam1:84/1428632365343/Put/vlen=2/seqid=173 V: 84 K: row-585/colfam1:85/1428632365352/Put/vlen=2/seqid=175 V: 85 K: row-586/colfam1:86/1428632365375/Put/vlen=2/seqid=177 V: 86 K: row-587/colfam1:87/1428632365535/Put/vlen=2/seqid=179 V: 87 K: row-588/colfam1:88/1428632365560/Put/vlen=2/seqid=181 V: 88 K: row-589/colfam1:89/1428632365569/Put/vlen=2/seqid=183 V: 89 K: row-590/colfam1:90/1428632365582/Put/vlen=2/seqid=185 V: 90 K: row-591/colfam1:91/1428632365594/Put/vlen=2/seqid=187 V: 91 K: row-592/colfam1:92/1428632365620/Put/vlen=2/seqid=189 V: 92 K: row-593/colfam1:93/1428632365633/Put/vlen=2/seqid=191 V: 93 K: row-594/colfam1:94/1428632365642/Put/vlen=2/seqid=193 V: 94 K: row-595/colfam1:95/1428632365651/Put/vlen=2/seqid=195 V: 95 K: row-596/colfam1:96/1428632365671/Put/vlen=2/seqid=197 V: 96 K: row-597/colfam1:97/1428632365679/Put/vlen=2/seqid=199 V: 97 K: row-598/colfam1:98/1428632365684/Put/vlen=2/seqid=201 V: 98 K: row-599/colfam1:99/1428632365689/Put/vlen=2/seqid=203 V: 99 K: row-600/colfam1:00/1428632365694/Put/vlen=2/seqid=205 V: 00 K: row-601/colfam1:01/1428632365702/Put/vlen=2/seqid=207 V: 01 K: row-602/colfam1:02/1428632365709/Put/vlen=2/seqid=209 V: 02 K: row-603/colfam1:03/1428632365717/Put/vlen=2/seqid=211 V: 03 K: row-604/colfam1:04/1428632365722/Put/vlen=2/seqid=213 V: 04 K: row-605/colfam1:05/1428632365729/Put/vlen=2/seqid=215 V: 05 K: row-606/colfam1:06/1428632365752/Put/vlen=2/seqid=217 V: 06 K: row-607/colfam1:07/1428632365758/Put/vlen=2/seqid=219 V: 07 K: row-608/colfam1:08/1428632365765/Put/vlen=2/seqid=221 V: 08 K: row-609/colfam1:09/1428632365773/Put/vlen=2/seqid=223 V: 09 K: row-610/colfam1:10/1428632365778/Put/vlen=2/seqid=225 V: 10 K: row-611/colfam1:11/1428632365785/Put/vlen=2/seqid=227 V: 11 K: row-612/colfam1:12/1428632365791/Put/vlen=2/seqid=229 V: 12 K: row-613/colfam1:13/1428632365798/Put/vlen=2/seqid=231 V: 13 K: row-614/colfam1:14/1428632365803/Put/vlen=2/seqid=233 V: 14 K: row-615/colfam1:15/1428632365811/Put/vlen=2/seqid=235 V: 15 K: row-616/colfam1:16/1428632365820/Put/vlen=2/seqid=237 V: 16 K: row-617/colfam1:17/1428632365834/Put/vlen=2/seqid=239 V: 17 K: row-618/colfam1:18/1428632365840/Put/vlen=2/seqid=241 V: 18 K: row-619/colfam1:19/1428632365850/Put/vlen=2/seqid=243 V: 19 K: row-620/colfam1:20/1428632365856/Put/vlen=2/seqid=245 V: 20 K: row-621/colfam1:21/1428632365864/Put/vlen=2/seqid=247 V: 21 K: row-622/colfam1:22/1428632365874/Put/vlen=2/seqid=249 V: 22 K: row-623/colfam1:23/1428632365882/Put/vlen=2/seqid=251 V: 23 K: row-624/colfam1:24/1428632365896/Put/vlen=2/seqid=253 V: 24 K: row-625/colfam1:25/1428632365903/Put/vlen=2/seqid=255 V: 25 K: row-626/colfam1:26/1428632365908/Put/vlen=2/seqid=257 V: 26 K: row-627/colfam1:27/1428632365917/Put/vlen=2/seqid=259 V: 27 K: row-628/colfam1:28/1428632365928/Put/vlen=2/seqid=261 V: 28 K: row-629/colfam1:29/1428632365934/Put/vlen=2/seqid=263 V: 29 K: row-630/colfam1:30/1428632365940/Put/vlen=2/seqid=265 V: 30 K: row-631/colfam1:31/1428632365945/Put/vlen=2/seqid=267 V: 31 K: row-632/colfam1:32/1428632365957/Put/vlen=2/seqid=269 V: 32 K: row-633/colfam1:33/1428632365967/Put/vlen=2/seqid=271 V: 33 K: row-634/colfam1:34/1428632365982/Put/vlen=2/seqid=273 V: 34 K: row-635/colfam1:35/1428632365999/Put/vlen=2/seqid=275 V: 35 K: row-636/colfam1:36/1428632366004/Put/vlen=2/seqid=277 V: 36 K: row-637/colfam1:37/1428632366020/Put/vlen=2/seqid=279 V: 37 K: row-638/colfam1:38/1428632366031/Put/vlen=2/seqid=281 V: 38 K: row-639/colfam1:39/1428632366038/Put/vlen=2/seqid=283 V: 39 K: row-640/colfam1:40/1428632366048/Put/vlen=2/seqid=285 V: 40 K: row-641/colfam1:41/1428632366057/Put/vlen=2/seqid=287 V: 41 K: row-642/colfam1:42/1428632366240/Put/vlen=2/seqid=289 V: 42 K: row-643/colfam1:43/1428632366249/Put/vlen=2/seqid=291 V: 43 K: row-644/colfam1:44/1428632366256/Put/vlen=2/seqid=293 V: 44 K: row-645/colfam1:45/1428632366264/Put/vlen=2/seqid=295 V: 45 K: row-646/colfam1:46/1428632366270/Put/vlen=2/seqid=297 V: 46 K: row-647/colfam1:47/1428632366276/Put/vlen=2/seqid=299 V: 47 K: row-648/colfam1:48/1428632366284/Put/vlen=2/seqid=301 V: 48 K: row-649/colfam1:49/1428632366290/Put/vlen=2/seqid=303 V: 49 K: row-650/colfam1:50/1428632366300/Put/vlen=2/seqid=305 V: 50 K: row-651/colfam1:51/1428632366305/Put/vlen=2/seqid=307 V: 51 K: row-652/colfam1:52/1428632366313/Put/vlen=2/seqid=309 V: 52 K: row-653/colfam1:53/1428632366321/Put/vlen=2/seqid=311 V: 53 K: row-654/colfam1:54/1428632366330/Put/vlen=2/seqid=313 V: 54 K: row-655/colfam1:55/1428632366337/Put/vlen=2/seqid=315 V: 55 K: row-656/colfam1:56/1428632366343/Put/vlen=2/seqid=317 V: 56 K: row-657/colfam1:57/1428632366350/Put/vlen=2/seqid=319 V: 57 K: row-658/colfam1:58/1428632366363/Put/vlen=2/seqid=321 V: 58 K: row-659/colfam1:59/1428632366370/Put/vlen=2/seqid=323 V: 59 K: row-660/colfam1:60/1428632366384/Put/vlen=2/seqid=325 V: 60 K: row-661/colfam1:61/1428632366392/Put/vlen=2/seqid=327 V: 61 K: row-662/colfam1:62/1428632366397/Put/vlen=2/seqid=329 V: 62 K: row-663/colfam1:63/1428632366403/Put/vlen=2/seqid=331 V: 63 K: row-664/colfam1:64/1428632366410/Put/vlen=2/seqid=333 V: 64 K: row-665/colfam1:65/1428632366421/Put/vlen=2/seqid=335 V: 65 K: row-666/colfam1:66/1428632366430/Put/vlen=2/seqid=337 V: 66 K: row-667/colfam1:67/1428632366437/Put/vlen=2/seqid=339 V: 67 K: row-668/colfam1:68/1428632366444/Put/vlen=2/seqid=341 V: 68 K: row-669/colfam1:69/1428632366461/Put/vlen=2/seqid=343 V: 69 K: row-670/colfam1:70/1428632366477/Put/vlen=2/seqid=345 V: 70 K: row-671/colfam1:71/1428632366487/Put/vlen=2/seqid=347 V: 71 K: row-672/colfam1:72/1428632366498/Put/vlen=2/seqid=349 V: 72 K: row-673/colfam1:73/1428632366507/Put/vlen=2/seqid=351 V: 73 K: row-674/colfam1:74/1428632366520/Put/vlen=2/seqid=353 V: 74 K: row-675/colfam1:75/1428632366530/Put/vlen=2/seqid=355 V: 75 K: row-676/colfam1:76/1428632366542/Put/vlen=2/seqid=357 V: 76 K: row-677/colfam1:77/1428632366555/Put/vlen=2/seqid=359 V: 77 K: row-678/colfam1:78/1428632366578/Put/vlen=2/seqid=361 V: 78 K: row-679/colfam1:79/1428632366588/Put/vlen=2/seqid=363 V: 79 K: row-680/colfam1:80/1428632366596/Put/vlen=2/seqid=365 V: 80 K: row-681/colfam1:81/1428632366604/Put/vlen=2/seqid=367 V: 81 K: row-682/colfam1:82/1428632366617/Put/vlen=2/seqid=369 V: 82 K: row-683/colfam1:83/1428632366629/Put/vlen=2/seqid=371 V: 83 K: row-684/colfam1:84/1428632366640/Put/vlen=2/seqid=373 V: 84 K: row-685/colfam1:85/1428632366649/Put/vlen=2/seqid=375 V: 85 K: row-686/colfam1:86/1428632366658/Put/vlen=2/seqid=377 V: 86 K: row-687/colfam1:87/1428632366664/Put/vlen=2/seqid=379 V: 87 K: row-688/colfam1:88/1428632366673/Put/vlen=2/seqid=381 V: 88 K: row-689/colfam1:89/1428632366680/Put/vlen=2/seqid=383 V: 89 K: row-690/colfam1:90/1428632366686/Put/vlen=2/seqid=385 V: 90 K: row-691/colfam1:91/1428632366693/Put/vlen=2/seqid=387 V: 91 K: row-692/colfam1:92/1428632366701/Put/vlen=2/seqid=389 V: 92 K: row-693/colfam1:93/1428632366857/Put/vlen=2/seqid=391 V: 93 K: row-694/colfam1:94/1428632366868/Put/vlen=2/seqid=393 V: 94 K: row-695/colfam1:95/1428632366873/Put/vlen=2/seqid=395 V: 95 K: row-696/colfam1:96/1428632366881/Put/vlen=2/seqid=397 V: 96 K: row-697/colfam1:97/1428632366890/Put/vlen=2/seqid=399 V: 97 K: row-698/colfam1:98/1428632366896/Put/vlen=2/seqid=401 V: 98 K: row-699/colfam1:99/1428632366902/Put/vlen=2/seqid=403 V: 99 Block index size as per heapsize: 400 ///dumps the internal HFile.Reader properties reader=/hbase/data/default/table3/1fa2e49c7404d3cd39afc39a99cc1c26/colfam1/0f6fc234c3014b6e9d84d3cae065d1b4, compression=none, cacheConf=CacheConfig:disabled, firstKey=row-500/colfam1:00/1428632364152/Put, lastKey=row-699/colfam1:99/1428632366902/Put, avgKeyLen=28, avgValueLen=2, entries=200, length=13581 ///Trailer块信息 Trailer: fileinfoOffset=8857, loadOnOpenDataOffset=8742, dataIndexCount=1, metaIndexCount=0, totalUncomressedBytes=13483, entryCount=200, compressionCodec=NONE, uncompressedDataIndexSize=41, numDataIndexLevels=1, firstDataBlockOffset=0, lastDataBlockOffset=0, comparatorClassName=org.apache.hadoop.hbase.KeyValue$KeyComparator, encryptionKey=NONE, majorVersion=3, minorVersion=0 ///FileInfo块信息 Fileinfo: BLOOM_FILTER_TYPE = ROW DELETE_FAMILY_COUNT = \x00\x00\x00\x00\x00\x00\x00\x00 EARLIEST_PUT_TS = \x00\x00\x01L\xA1\x1F\xE4x KEY_VALUE_VERSION = \x00\x00\x00\x01 LAST_BLOOM_KEY = row-699 MAJOR_COMPACTION_KEY = \x00 MAX_MEMSTORE_TS_KEY = \x00\x00\x00\x00\x00\x00\x01\x93 MAX_SEQ_ID_KEY = 404 TIMERANGE = 1428632364152....1428632366902 hfile.AVG_KEY_LEN = 28 hfile.AVG_VALUE_LEN = 2 hfile.LASTKEY = \x00\x07row-699\x07colfam199\x00\x00\x01L\xA1\x1F\xEF6\x04 hfile.MAX_TAGS_LEN = \x00\x00\x00\x00 hfile.TAGS_COMPRESSED = \x00 Mid-key: \x00\x07row-500\x07colfam100\x00\x00\x01L\xA1\x1F\xE4x\x04 Bloom filter: BloomSize: 256 No of Keys in bloom: 200 Max Keys for bloom: 213 Percentage filled: 94% Number of chunks: 1 Comparator: RawBytesComparator Delete Family Bloom filter: Not present ///查询到的数据KV总数 Scanned kv count -> 200
2. KeyValue的Format
在HFile中,KeyValue是一个字节数组,由如下信息组成
3. HFile数据结构
3.1 Trailer数据块
Trailer是定长的,如图中所示,Trailer中有指针指向其他数据块的起始点,读取一个HFile时,会首先读取Trailer,然后DataBlock Index会被读取到内存中,这样当检索某个key时,不需要扫描整个HFile,而只需从内存中找到key所在的block,通过一次磁盘io将整个block读取到内存中,再找到需要的key。
3.2 File Info数据块
File Info数据块是定长的,记录了文件的一些Meta信息,例如:AVG_KEY_LEN, AVG_VALUE_LEN,LAST_KEY, COMPARATOR, MAX_SEQ_ID_KEY等。
3.3 Data Block
Data Block保存表中的数据,是HBase I/O的基本单元,为了提高效率,HRegionServer中有基于LRU的block cache机制。每个Data块的大小可以在创建一个table的时候通过参数指定,大号的block有利于顺序scan,小号block利于随机查询。
每个Data块除了开头的Magic以外就是一个个KeyValue对拼接而成,Magic内容就是一些随机数字,目的是防止数据损坏。每个块都有一个魔数
关于Data Block的块大小
Minimum block size. We recommend a setting of minimum block size between 8KB to 1MB for general usage. Larger block size is preferred if files are primarily for sequential access. However, it would lead to inefficient random access (because there are more data to decompress). Smaller blocks are good for random access, but require more memory to hold the block index, and may be slower to create (because we must flush the compressor stream at the conclusion of each data block, which leads to an FS I/O flush). Further, due to the internal caching in Compression codec, the smallest possible block size would be around 20KB-30KB.
3.4 Meta Block段(可选的):
保存用户自定义的KeyValue对,可以被压缩。
Data Block Index段:
Data Block的索引,每条索引的key是被索引的block的第一条记录的key。The index blocks record the offsets of the data and meta blocks
4. HFile与HDFS Block的关系
HFile的块大小默认是64k,而HDFS的块大小默认是64M,因此,HDFS的块大小是HFile的块大小的1024倍,下图展现了232M
5. HFile Compact
数据写入流程: Client写入 -> 存入MemStore,一直到MemStore满 -> Flush成一个StoreFile,StoreFile数目直至增长到一定阈值 -> 触发Compact合并操作 -> 多个StoreFile合并成一个StoreFile,同时进行版本合并和数据删除 -> 当StoreFiles Compact后,逐步形成越来越大的StoreFile -> 单个StoreFile大小超过一定阈值后,触发Split操作,把当前Region Split成2个Region,Region会下线,新Split出的2个孩子Region会被HMaster分配到相应的HRegionServer上,使得原先1个Region的压力得以分流到2个Region上。由此过程可知,HBase只是增加数据,有所得更新和删除操作,都是在Compact阶段做的,所以,用户写操作只需要进入到内存即可立即返回,从而保证I/O高性能。
参考:http://blog.csdn.net/john_f_lau/article/details/18899311