hadoop 2.x-HDFS snapshot

  I dont want to restruct wheels of open sources,in contrast, just wonder the implied features and use cases as possible.so i will write somethings to summary or memo.

Agenda

 1.what is 

 2.how to 

 3.hadoop snapshot vs hbase snapshot

 4.demos to use snapshot

 

  1.what is

  a long time ago,the term 'snapshot'  was introduced to describe 'the aspect of something in a point in-time',e.g memory snapshot,db's snapshot,or even google's page snapshot etc.but they have the similar or close means:a certain view/image of one thing in history.

  akin to hadoop's snapshot,we want to use this 'view' to cut the files at a point in-time.so its usages will like this:

  a. a periodic backup 

  b.restore some key data from mistaken deletions

  c.isolutes some important data from product for testing ,comparing etc

 

  and there are some features among this snapshot:

  -no any data to be moved or copied,so the network bandwidth is not affected

  -not causing too many tasks for namenode or datanode to deal with ,so reliability is also kept staying

 

  2.how to

  benefits from hdfs file support of write-once and read-many characteristic,hadoop snapshot uses it to function properly.when create a new snapshot on a dir,the namenode will register this dir as a snapshotable dir to provide protection:all operations include deletion ,move,or creation of files and dirs will only affect the 'metadata' in namenode,so the actual files and dirs will not applied instantly .so after a while,if u want to restore some files/dirs,u can move or copy  the snapshoted files or dirs from '.snapshot' dir to anywhere u wnat.when u delete the snapshot created before,then the prior operations will apply right now.

  for deep study of 'linked data structure' u can check out 'making data structures persistent'

 

  3.hadoop snapshot vs hbase snapshot

  according to the version releases between hadoop and hbase,i think hadoop's snapshot is introduced from hbase's one:) ,so the underlying implementions of them are similar.here are some differences in snapshot below:

  hadoop hbase supplement
copy/move data n n  

gen new files refered

to original files

n y

hbase will gen many

temp files to point to the

real hdfs files

       

  so for a hhbase cluster,i think it's unnecessary to backup(snapshot) hadoop hdfs againt if use hbase snapshot already;else it should be.in the sense that there are most overlapings between both snapshots.

 

  4.demos to use snapshot

  there are some usage demos in apache official site [2],but i want to declare that this snapshot is 'read-only' (RO) instead of RW,hence then ,if u make some changes in the '.snapshot' dir will cause something errors,in addition ,if u want to check out the real principles of the commands,see details in 'NameNodeRpcServer.java'

 

 

ref:

jira:Support for RW/RO snapshots in HDFS

 

[2]HDFS Snapshots

hbase -tables replication/snapshot/backup within/cross clusters

hadoop-2.x --new features

你可能感兴趣的:(hadoop)