I dont want to restruct wheels of open sources,in contrast, just wonder the implied features and use cases as possible.so i will write somethings to summary or memo.
Agenda
1.what is
2.how to
3.hadoop snapshot vs hbase snapshot
4.demos to use snapshot
1.what is
a long time ago,the term 'snapshot' was introduced to describe 'the aspect of something in a point in-time',e.g memory snapshot,db's snapshot,or even google's page snapshot etc.but they have the similar or close means:a certain view/image of one thing in history.
akin to hadoop's snapshot,we want to use this 'view' to cut the files at a point in-time.so its usages will like this:
a. a periodic backup
b.restore some key data from mistaken deletions
c.isolutes some important data from product for testing ,comparing etc
and there are some features among this snapshot:
-no any data to be moved or copied,so the network bandwidth is not affected
-not causing too many tasks for namenode or datanode to deal with ,so reliability is also kept staying
2.how to
benefits from hdfs file support of write-once and read-many characteristic,hadoop snapshot uses it to function properly.when create a new snapshot on a dir,the namenode will register this dir as a snapshotable dir to provide protection:all operations include deletion ,move,or creation of files and dirs will only affect the 'metadata' in namenode,so the actual files and dirs will not applied instantly .so after a while,if u want to restore some files/dirs,u can move or copy the snapshoted files or dirs from '.snapshot' dir to anywhere u wnat.when u delete the snapshot created before,then the prior operations will apply right now.
for deep study of 'linked data structure' u can check out 'making data structures persistent'
3.hadoop snapshot vs hbase snapshot
according to the version releases between hadoop and hbase,i think hadoop's snapshot is introduced from hbase's one:) ,so the underlying implementions of them are similar.here are some differences in snapshot below:
hadoop | hbase | supplement | |
copy/move data | n | n | |
gen new files refered to original files |
n | y | hbase will gen many temp files to point to the real hdfs files |
so for a hhbase cluster,i think it's unnecessary to backup(snapshot) hadoop hdfs againt if use hbase snapshot already;else it should be.in the sense that there are most overlapings between both snapshots.
4.demos to use snapshot
there are some usage demos in apache official site [2],but i want to declare that this snapshot is 'read-only' (RO) instead of RW,hence then ,if u make some changes in the '.snapshot' dir will cause something errors,in addition ,if u want to check out the real principles of the commands,see details in 'NameNodeRpcServer.java'
ref:
jira:Support for RW/RO snapshots in HDFS
hbase -tables replication/snapshot/backup within/cross clusters