DataNode工作机制

掉线时限参数设置

超时时长的计算公式为：

2 * dfs.namenode.heartbeat.recheck.interval + 10 * dfs.heartbeat.interval

而默认的dfs.namenode.heartbeat.recheck.interval大小为5分钟，dfs.heartbeat.interval默认为3秒

需要注意的是hdfs-size.xml配置文件中heartbeat.recheck.interval的单位为毫秒，dfs.heartbeat.interval的单位为秒

添加新节点

1）环境准备

2）修改IP地址和主机名称

3）删除原来HDFS文件系统留存的文件（/opt/module/hadoop-2.7.2/data 和 log）

4）source一下配置文件

$ source /etc/profile

5）直接启动DataNode 即可关联到集群

$ sbin/hadoop-daemon.sh start datanode

6）在web浏览器查看上传文件进行测试

7）后期如果想要实现群启需要配置：slaves，ssh无密登录，实现群分发需要配置xsync

添加白名单节点

不允许不在白名单上的服务器访问集群

1）在NameNode的/opt/module/hadoop-2.7.2/etc/hadoop目录下创建dfs.hosts文件

$ touch dfs.hosts

$ vim dfs.hosts

# 添加主机名称

hadoop102

hadoop103

hadoop104

2）在NameNode的hdfs-size.xml配置文件中增加dfs.hosts属性

dfs.hosts

/opt/module/hadoop-2.7.2/etc/hadoop/dfs.hosts

3）分发配置文件

$ xsync hdfs-site.xml

4）刷新NameNode

$ hdfs dfsadmin -refreshNodes

5）更新ResourceManager节点

$ yarn rmadmin -refreshNodes

6）在web浏览器等待执行成功后查看

7）如果数据不均衡，可以用命令实现集群的再平衡

$ ./start-balancer.sh

添加黑名单退役节点

在黑名单上的主机都会被强制推出

1）在NameNode的/opt/module/hadoop-2.7.2/etc/hadoop目录下创建dfs.hosts.exclude文件

$ touch dfs.hosts.exclude

$ vim dfs.hosts.exclude

# 添加要退役的主机名称

hadoop105

2）在NameNode的hdfs-site.xml配置文件中增加dfs.hosts.exclude属性

dfs.hosts.exclude

/opt/module/hadoop-2.7.2/etc/hadoop/dfs.hosts.exclude

3）分发配置文件

$ xsync hdfs-site.xml

4）刷新NameNode 刷新ResourceManager

$ hdfs dfsadmin -refreshNodes

$ yarn rmadmin -refreshNodes

5）在web浏览器等待执行成功后单节点推出

$ sbin/hadoop-daemon.sh stop datanode

$ sbin/yarn-daemon.sh stop nodemanager

6）如果数据不均衡，可以用命令实现集群的再平衡

$ ./start-balancer.sh

注意：不允许白名单和黑名单中同时出现同一个主机名称

DataNode多目录配置

hdfs-size.xml

dfs.datanode.data.dir

file:///${hadoop.tmp.dir}/dfs/data1,file:///${hadoop.tmp.dir}/dfs/data2

注意：

datanode多目录每个目录存储的数据不一样，只是分了路径而已。

namenode多目录是拷贝了一个副本。

11）DataNode