1.首先检查整个集群的Average block replication,如果大于2,那即使直接拔出一个节点也不会丢失数据
hadoop fsck / 检查集群的文件系统状况
可以手动设置文件的冗余倍数,为了安全备份,可对关键数据hadoop fs -setrep -w 3 -R <path>
2.给下面两个文件增加配置,在excludes中填写要解任的节点名
mapred-site.xml
<property>
<name>mapred.hosts</name>
<value></value>
<description>Names a file that contains the list of nodes that may
connect to the jobtracker. If the value is empty, all hosts are
permitted.</description>
</property>
<property>
<name>mapred.hosts.exclude</name>
<value>HADOOP_HOME/conf/excludes</value>
<description>Names a file that contains the list of hosts that
should be excluded by the jobtracker. If the value is empty, no
hosts are excluded.</description>
</property>
hdfs-site.xml
<property>
<name>dfs.hosts</name>
<value></value>
<description>Names a file that contains a list of hosts that are
permitted to connect to the namenode. The full pathname of the file
must be specified. If the value is empty, all hosts are
permitted.</description>
</property>
<property>
<name>dfs.hosts.exclude</name>
<value>HADOOP_HOME/conf/excludes</value>
<description>Names a file that contains a list of hosts that are
not permitted to connect to the namenode. The full pathname of the
file must be specified. If the value is empty, no hosts are
excluded.</description>
</property>
excludes 文件里面配置机器的hostname即可。
run on namenode: hadoop dfsadmin -refreshNodes
run on jobtracker: hadoop mradmin -refreshNodes
“hadoop dfsadmin -refreshNodes”会触发Decommission过程,在Decommission过程,集群会将Decommission节点上的数据冗余到其他几点上,
Decommission is not instant since it requires replication of potentially a large number of blocks and we do not want the cluster to be overwhelmed with just this one job.
The decommission progress can be monitored on the name-node Web UI. Until all blocks are replicated the node will be in "Decommission In Progress" state.
When decommission is done the state will change to "Decommissioned". The nodes can be removed whenever decommission is finished.
The decommission process can be terminated at any time by editing the configuration or the exclude files and repeating the -refreshNodes command.
Decommission refer:
http://wiki.apache.org/hadoop/FAQ
cloudera cdh3u5 version refer:
http://blog.csdn.net/rzhzhz/article/details/7577352