hadoop配置 - 添加删除节点

http://www.cnblogs.com/rilley/archive/2012/02/13/2349858.html

http://www.cnblogs.com/licheng/archive/2011/11/08/2241854.html

http://www.blogjava.net/ivanwan/archive/2011/01/21/343328.html

今天在hadoop集群环境下需要将两台datanode删除,为了不影响在运行业务,需对节点进行动态删除,记录操作过程如下:

1, 从集群中移走节点,需要对移走节点的数据进行备份:

在主节点的core-site.xml配置文件中添加如下内容:

<property>
          <name>dfs.hosts.exclude</name>
          <value>/home/hadoop/hadoop/conf/excludes</value>
</property>

说明

dfs.hosts.exclude:指要删除的节点

/home/hadoop/hadoop/conf/excludes:指定要被删除文件所在路径及名称,该处定义为excludes

 

2, 在1中设置目录中touch excludes,内容为每行需要移走的节点

cloud4

cloud5

 

3,进入 运行命令:hadoop dfsadmin -refreshNodes(我这用的yum安装的,不同安装方式hadoop目录会在不同路径),该命令可以动态刷新dfs.hosts和dfs.hosts.exclude配置,无需重启NameNode。

 执行完成被删除节点datanode消失了,但是tasktracker还会存在,需要自己手动停掉

 

4,然后通过 bin/hadoop dfsadmin -report查看,结果如下: 

Configured Capacity: 17721082527744 (16.12 TB)
Present Capacity: 16806607028262 (15.29 TB)
DFS Remaining: 14996775104512 (13.64 TB)
DFS Used: 1809831923750 (1.65 TB)
DFS Used%: 10.77%
Under replicated blocks: 6788
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Datanodes available: 6 (6 total, 0 dead)

Name: 192.168.1.5:50010
Decommission Status : Normal
Configured Capacity: 2953511657472 (2.69 TB)
DFS Used: 265079108972 (246.87 GB)
Non DFS Used: 150286670484 (139.97 GB)
DFS Remaining: 2538145878016(2.31 TB)
DFS Used%: 8.98%
DFS Remaining%: 85.94%
Last contact: Thu Sep 08 10:12:45 CST 2011

Name: 192.168.1.8:50010
Decommission Status : Decommission in progress
Configured Capacity: 2953511657472 (2.69 TB)
DFS Used: 228590288896 (212.89 GB)
Non DFS Used: 150240718848 (139.92 GB)
DFS Remaining: 2574680649728(2.34 TB)
DFS Used%: 7.74%
DFS Remaining%: 87.17%
Last contact: Thu Sep 08 10:12:45 CST 2011

Name: 192.168.1.7:50010
Decommission Status : Normal
Configured Capacity: 2953511657472 (2.69 TB)
DFS Used: 266826599821 (248.5 GB)
Non DFS Used: 150259458675 (139.94 GB)
DFS Remaining: 2536425598976(2.31 TB)
DFS Used%: 9.03%
DFS Remaining%: 85.88%
Last contact: Thu Sep 08 10:12:46 CST 2011

Name: 192.168.1.9:50010
Decommission Status : Decommission in progress
Configured Capacity: 2953511657472 (2.69 TB)
DFS Used: 226060701696 (210.54 GB)
Non DFS Used: 150240718848 (139.92 GB)
DFS Remaining: 2577210236928(2.34 TB)
DFS Used%: 7.65%
DFS Remaining%: 87.26%
Last contact: Thu Sep 08 10:12:45 CST 2011

Name: 192.168.1.4:50010
Decommission Status : Normal
Configured Capacity: 2953524240384 (2.69 TB)
DFS Used: 553202110857 (515.21 GB)
Non DFS Used: 163197603447 (151.99 GB)
DFS Remaining: 2237124526080(2.03 TB)
DFS Used%: 18.73%
DFS Remaining%: 75.74%
Last contact: Thu Sep 08 10:12:46 CST 2011

Name: 192.168.1.6:50010
Decommission Status : Normal
Configured Capacity: 2953511657472 (2.69 TB)
DFS Used: 270073113508 (251.53 GB)
Non DFS Used: 150250329180 (139.93 GB)
DFS Remaining: 2533188214784(2.3 TB)
DFS Used%: 9.14%
DFS Remaining%: 85.77%
Last contact: Thu Sep 08 10:12:44 CST 2011

5,通过4中命令可以查看到被删除节点状态,如192.168.1.9

Decommission Status : Decommissioned

说明从91往其他节点同步数据已经完成,如果状态为Decommission Status : Decommissione in process则还在执行。

至此删除节点操作完成

 

问题总结

在拔掉节点时注意要把往hadoop放数据程序先停掉,否则程序还会往要删除节点同步数据,删除节点程序会一直执行。

 

添加节点

1.修改host 
  和普通的datanode一样。添加namenode的ip
2.修改namenode的配置文件conf/slaves 
  添加新增节点的ip或host
3.在新节点的机器上,启动服务 

[root@slave-004 hadoop]# ./bin/hadoop-daemon.sh start datanode
[root@slave-004 hadoop]# ./bin/hadoop-daemon.sh start tasktracker

4.均衡block 

[root@slave-004 hadoop]# ./bin/start-balancer.sh

1)如果不balance,那么cluster会把新的数据都存放在新的node上,这样会降低mapred的工作效率 
2)设置平衡阈值,默认是10%,值越低各节点越平衡,但消耗时间也更长 

[root@slave-004 hadoop]# ./bin/start-balancer.sh -threshold 5

3)设置balance的带宽,默认只有1M/s

<property>
   <name>dfs.balance.bandwidthPerSec</name>
   <value>1048576</value>
   <description>
     Specifies the maximum amount of bandwidth that each datanode
     can utilize for the balancing purpose in term of
     the number of bytes per second.
   </description>
</property>

注意: 
1. 必须确保slave的firewall已关闭; 
2. 确保新的slave的ip已经添加到master及其他slaves的/etc/hosts中,反之也要将master及其他slave的ip添加到新的slave的/etc/hosts中


删除节点

1.集群配置 
   修改conf/hdfs-site.xml文件

<property>
   <name>dfs.hosts.exclude</name>
   <value>/data/soft/hadoop/conf/excludes</value>
   <description>Names a file that contains a list of hosts that are
   not permitted to connect to the namenode. The full pathname of the
   file must be specified. If the value is empty, no hosts are
   excluded.</description>
</property>


2确定要下架的机器 
dfs.hosts.exclude定义的文件内容为,每个需要下线的机器,一行一个。这个将阻止他们去连接Namenode。如: 

slave-003
slave-004

 
3.强制重新加载配置 

[root@master hadoop]# ./bin/hadoop dfsadmin -refreshNodes

它会在后台进行Block块的移动 


4.关闭节点 
等待刚刚的操作结束后,需要下架的机器就可以安全的关闭了。 

[root@master hadoop]# ./bin/ hadoop dfsadmin -report


可以查看到现在集群上连接的节点 

正在执行Decommission,会显示:
Decommission Status : Decommission in progress

执行完毕后,会显示:
Decommission Status : Decommissioned

 
5.再次编辑excludes文件 
一旦完成了机器下架,它们就可以从excludes文件移除了 
登录要下架的机器,会发现DataNode进程没有了,但是TaskTracker依然存在,需要手工处理一下

你可能感兴趣的:(hadoop配置 - 添加删除节点)