as described in title,there some memory buffers in hbase called 'memstore' which will be stuffed when writing.this policy provides a asynchronization operations in writes(if ignore the wal ,of course) and high speed retrieving in reads.both the memstore and block cache consists of a sly trick of 'double buffer':
this is the hbase component structure below
of course,as some components like wal,memstore provides two styles to flush:manual or period checking
how to
memstore flushing is a bit complex,as it involves certain consistent operations of some other appropriate compoents ,like wal and mvcc.so it is very important to coplete this oper as soon as possible for avoiding blocking writings.
here i will only consider some steps about memstore but wal and mvcc,
1.take a snapshot per memstore
2.flush all underlying mutations (data,meta,index,trailer etc) to hfile
3.inline new flushed hfile and clear snapshot(ie. swith snapshot with hfile)
4.append a flag 'COMPLETE_CACHE_FLUSH' to wal that means if a later failure occurs ,the hlog will be replayed to here only
5.notify some threads who are waiting on this region to continue to mutation
'snapshot' here is used for supplying continuous/uninterrupt service for readings when 'flush'.
trigger conditions
no | case | meaning | |
1 | memstore size > hbase.hregion.memstore.flush.size |
when total memstore size belong one region is bigger than flush.size | |
2 | over global memstore lower water | TODO | |
3 | too many hlogs | TODO | |
after a flush memstore ,i notified the mem usage is varied from below:
memstore:uncompressed-file:comprssed-file = 4:2:1
for my page table.
TODO so i think it is a bit unnormal for the ratio of first pair memstore:uncompressed
ref:
hbase-hfile format
hbase-hlog sync flow
hbase-mvcc principle
hbase guide