hadoop学习笔记

一、增加新的数据节点
1.1 配置数据节点相关的所有信息(slaves,master, core-site, hdfs-site,mapred-site,yarn-site 等)
1.2 /etc/hosts 设置
1.3 /etc/hostname 设置
1.4 启动 bin/hadoop-deamon.sh start datanode
1.5 启动 bin/hadoop-deamon.sh start tasktracker

通过相关日志检查,是否成功

二、负载均衡
启动 bin/start-balancer.sh –threshold 15

三、卸载数据节点

在名称节点上操作

  1. conf/hdfs-site.xml中增加
<property>
<name>dfs.hosts.exclude</name>
<value>[FULL_PATH_TO_THE_EXCLUDE_FILE]</value>
<description>Names a file that contains a list of hosts thatare
not permitted to connect to the namenode. The full pathname of
the file must be specified. If the value is empty, no hosts are
excluded.</description>
</property>
  1. 执行 bin/hadoop dfsadmin -refreshNodes

  2. 监控

四、 Using multiple disks/volumes and limiting HDFS disk usage
1.1 指定多分区
conf/hdfs-site.xml

<property>
<name>dfs.data.dir</name>
<value>/u1/hadoop/data,/u2/hadoop/data</value>
</property>

1.2 限定磁盘大小

<property>
<name>dfs.datanode.du.reserved</name>
<value>6000000000</value>
<description>Reserved space in bytes per volume. Always leave
this much space free for non dfs use.
</description>
</property>

五、 Setting HDFS block size

方法1、 修改conf/hdfs-site.xml .

<property>
<name>dfs.block.size</name>
<value>134217728</value>
</property>

方法2:有些特定场景使用,比如上传
bin/hadoop fs -Ddfs.blocksize=134217728 -put data.in /user/foo

六、Setting the file replication factor
方法1: 修改conf/hdfs-site.xml

<property>
<name>dfs.replication</name>
<value>2</value>
</property>

方法2:上传时设置
bin/hadoop fs -D dfs.replication=1 -copyFromLocal non-critical-file.txt /user/foo

方法3:改变复制因子
bin/hadoop fs -setrep 2 non-critical-file.txt

你可能感兴趣的:(hadoop)