(1)NameNode内存计算
export HDFS_NAMENODE_OPTS="-Dhadoop.security.logger=INFO,RFAS -Xmx1024m"
export HDFS_DATANODE_OPTS="-Dhadoop.security.logger=ERROR,RFAS-Xmx1024m"
fs.trash.interval
1
#进入到hadoop文件中
hadoop jar /opt/module/hadoop-3.1.3/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.1.3-tests.jar TestDFSIO -write -nrFiles 10 -fileSize 128M
# 测试写性能,测试1个文件,每个文件128MB
yarn.nodemanager.vmem-check-enabled
false
#配置完成之后,分发文件,在hadoop的目录执行sbin/stop-yarn.sh sbin/start-yarn.sh 重启yarn
200多每秒,已经超过了网络的限制,因为没走网络,因为没走网络,所以这个数据就是和你的硬盘有关系了,和网络没关系
速度和网络磁盘有关系,网络的话就相办法提高网络
磁盘的话就想办法扩磁盘
dfs.namenode.name.dir
file://${hadoop.tmp.dir}/dfs/name1,file://${hadoop.tmp.
dir}/dfs/name2
rm -rf data/ logs/
dfs.datanode.data.dir
file://${hadoop.tmp.dir}/dfs/data1,file://${hadoop.tmp.
dir}/dfs/data2
分发数据
dfs.hosts
/opt/module/hadoop-3.1.3/etc/hadoop/whitelis
dfs.hosts.exclude
/opt/module/hadoop-3.1.3/etc/hadoop/blacklist
只有两台,现在是将104作为客户端,但是不会将数据存储在你这里
接下来在whitelis文件中将104也设置为可以访问,然后分发数据
重新查看恢复正常是3台
1)随着公司业务的增长,数据量越来越大,原有的数据节点容量已经不能满足存储数据的需求,需要在原有的集群基础上动态添加新的数据节点,
vim /etc/sysconfig/network-scripts/ifcfg-ens33
修改ip地址
vim /etc/hostname
修改主机名称然后重启
- 在102上将hadoop和java拷贝到105上
- scp -r module/* 192.168.116.135:/opt/module/
- 拷贝环境变量
- sudo scp /etc/profile.d/my_env.sh 192.168.116.135:/etc/profile.d
- 在105上执行 source /etc/profile
- 然后再102-103-104上执行vim /etc/hosts,全部修改
- 在102上-103上-104上 配置ssh(普通用户)
- cd .ssh/
- ssh-copy-id hapool105
- 配置完毕之后在105上的hadoop下 rm -rf data/ logs/
- 在102上给白名单新增hapool105,然后分发 刷新 hdfs dfsadmin -refreshNodes
[liuxingyu@hapool103 hadoop-3.1.3]$ sbin/start-balancer.sh -threshoud 10
dfs.hosts.exclude
/opt/module/hadoop-3.1.3/etc/hadoop/blacklist
-最后执行命令,关闭105的节点即可
yarn --daemon stop nodemanager
hdfs --daemon stop datanode
如果只是因为进程挂了用hdfs --daemon start namenode命令
接下来再次关掉进程,删除/opt/module/hadoop-3.1.3/data/dfs下的所有文件
-再次重新启动发现启动不起来
可以查看NameNode的日志
rm -rf blk_10737418255 blk_1073741825_1001.meta blk_1073741826 blk_1073741826_1002.meta
NameNode是默认6个小时DateNode才会回报一次,所以现在NameNode是不知道文件损坏的,应该重启一下集群
看这里,大概讲的是 这个快3个如果你在添加一个块就会达到这个0.999的标准了,以为总共块是5个,然后就会一直卡在这,一直离不开安全模式
解决办法:1.找专业的人做磁盘修复
不然下次启动还是安全模式
模拟等待安全模式
如果集群进入了安全模式
然后执行了hdfs dfsadmin -safemode wait 并上换了一个文件
此时就会产生阻塞的状态,文件无法上传,需要执行离开安全模式的命令才会结束堵塞
#通过命令进行安装
sudo yum install -y fio
[atguigu@hadoop102 ~]# sudo yum install -y fio
[atguigu@hadoop102 ~]# sudo fio -
filename=/home/atguigu/test.log -direct=1 -iodepth 1 -thread -
rw=read -ioengine=psync -bs=16k -size=2G -numjobs=10 -
runtime=60 -group_reporting -name=test_r
Run status group 0 (all jobs):
READ: bw=360MiB/s (378MB/s), 360MiB/s-360MiB/s (378MB/s-378MB/s),
io=20.0GiB (21.5GB), run=56885-56885msec
结果显示,磁盘的总体顺序读速度为 360MiB/s。
[atguigu@hadoop102 ~]# sudo fio -
filename=/home/atguigu/test.log -direct=1 -iodepth 1 -thread -
rw=write -ioengine=psync -bs=16k -size=2G -numjobs=10 -
runtime=60 -group_reporting -name=test_w
Run status group 0 (all jobs):
WRITE: bw=341MiB/s (357MB/s), 341MiB/s-341MiB/s (357MB/s357MB/s), io=19.0GiB (21.4GB), run=60001-60001msec
结果显示,磁盘的总体顺序写速度为 341MiB/s。
-(3)随机写测试
[atguigu@hadoop102 ~]# sudo fio -
filename=/home/atguigu/test.log -direct=1 -iodepth 1 -thread -
rw=randwrite -ioengine=psync -bs=16k -size=2G -numjobs=10 -
runtime=60 -group_reporting -name=test_randw
Run status group 0 (all jobs):
WRITE: bw=309MiB/s (324MB/s), 309MiB/s-309MiB/s (324MB/s-324MB/s),
io=18.1GiB (19.4GB), run=60001-60001msec
结果显示,磁盘的总体随机写速度为 309MiB/s。
[atguigu@hadoop102 ~]# sudo fio -
filename=/home/atguigu/test.log -direct=1 -iodepth 1 -thread -
rw=randrw -rwmixread=70 -ioengine=psync -bs=16k -size=2G -
numjobs=10 -runtime=60 -group_reporting -name=test_r_w -
ioscheduler=noop
Run status group 0 (all jobs):
READ: bw=220MiB/s (231MB/s), 220MiB/s-220MiB/s (231MB/s231MB/s), io=12.9GiB (13.9GB), run=60001-60001msec
WRITE: bw=94.6MiB/s (99.2MB/s), 94.6MiB/s-94.6MiB/s
(99.2MB/s-99.2MB/s), io=5674MiB (5950MB), run=60001-60001mse
hadoop archive -archiveName input.har -p /input /output
(3)查看归档
hadoop fs -ls /output/input.har
hadoop fs -ls har:///output/input.har
(4)解归档文件
hadoop fs -cp har:///output/input.har /
# 执行任务发现uber 默认是关闭的
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar wordcount /input /output2
(4)开启 uber 模式,在 mapred-site.xml 中添加如下配置
mapreduce.job.ubertask.enable
true
mapreduce.job.ubertask.maxmaps
9
mapreduce.job.ubertask.maxreduces
1
mapreduce.job.ubertask.maxbytes
(1)修改:hadoop-env.sh
export HDFS_NAMENODE_OPTS="-Dhadoop.security.logger=INFO,RFAS -
Xmx1024m"
export HDFS_DATANODE_OPTS="-Dhadoop.security.logger=ERROR,RFAS
-Xmx1024m"
(2)修改 hdfs-site.xml
dfs.namenode.handler.count
21
(3)修改 core-site.xml
fs.trash.interval
60
(1)修改 mapred-site.xml
mapreduce.task.io.sort.mb
100
mapreduce.map.sort.spill.percent
0.80
mapreduce.task.io.sort.factor
10
mapreduce.map.memory.mb
-1
The amount of memory to request from the
scheduler for each map task. If this is not specified or is
non-positive, it is inferred from mapreduce.map.java.opts and
mapreduce.job.heap.memory-mb.ratio. If java-opts are also not
specified, we set it to 1024.
mapreduce.map.cpu.vcores
1
mapreduce.map.maxattempts
4
mapreduce.reduce.shuffle.parallelcopies
5
mapreduce.reduce.shuffle.input.buffer.percent
0.70
mapreduce.reduce.shuffle.merge.percent
0.66
mapreduce.reduce.memory.mb
-1
The amount of memory to request from the
scheduler for each reduce task. If this is not specified or
is non-positive, it is inferred
from mapreduce.reduce.java.opts and
mapreduce.job.heap.memory-mb.ratio.
If java-opts are also not specified, we set it to 1024.
mapreduce.reduce.cpu.vcores
2
mapreduce.reduce.maxattempts
4
mapreduce.job.reduce.slowstart.completedmaps
0.05
mapreduce.task.timeout
600000
(1)修改 yarn-site.xml 配置参数如下:
The class to use as the resource scheduler.
yarn.resourcemanager.scheduler.class
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capaci
ty.CapacityScheduler
Number of threads to handle scheduler
interface.
yarn.resourcemanager.scheduler.client.thread-count
8
Enable auto-detection of node capabilities such as
memory and CPU.
yarn.nodemanager.resource.detect-hardware-capabilities
false
Flag to determine if logical processors(such as
hyperthreads) should be counted as cores. Only applicable on Linux
when yarn.nodemanager.resource.cpu-vcores is set to -1 and
yarn.nodemanager.resource.detect-hardware-capabilities is true.
yarn.nodemanager.resource.count-logical-processors-ascores
false
Multiplier to determine how to convert phyiscal cores to
vcores. This value is used if yarn.nodemanager.resource.cpu-vcores
is set to -1(which implies auto-calculate vcores) and
yarn.nodemanager.resource.detect-hardware-capabilities is set to true.
The number of vcores will be calculated as number of CPUs * multiplier.
yarn.nodemanager.resource.pcores-vcores-multiplier
1.0
Amount of physical memory, in MB, that can be allocated
for containers. If set to -1 and
yarn.nodemanager.resource.detect-hardware-capabilities is true, it is
automatically calculated(in case of Windows and Linux).
In other cases, the default is 8192MB.
yarn.nodemanager.resource.memory-mb
4096
Number of vcores that can be allocated
for containers. This is used by the RM scheduler when allocating
resources for containers. This is not used to limit the number of
CPUs used by YARN containers. If it is set to -1 and
yarn.nodemanager.resource.detect-hardware-capabilities is true, it is
automatically determined from the hardware in case of Windows and Linux.
In other cases, number of vcores is 8 by default.
yarn.nodemanager.resource.cpu-vcores
4
The minimum allocation for every container request at the
RM in MBs. Memory requests lower than this will be set to the value of
this property. Additionally, a node manager that is configured to have
less memory than this value will be shut down by the resource manager.
yarn.scheduler.minimum-allocation-mb
1024
The maximum allocation for every container request at the
RM in MBs. Memory requests higher than this will throw an
InvalidResourceRequestException.
yarn.scheduler.maximum-allocation-mb
2048
The minimum allocation for every container request at the
RM in terms of virtual CPU cores. Requests lower than this will be set to
the value of this property. Additionally, a node manager that is configured
to have fewer virtual cores than this value will be shut down by the
resource manager.
yarn.scheduler.minimum-allocation-vcores
1
The maximum allocation for every container request at the
RM in terms of virtual CPU cores. Requests higher than this will throw an
InvalidResourceRequestException.
yarn.scheduler.maximum-allocation-vcores
2
Whether virtual memory limits will be enforced for
containers.
yarn.nodemanager.vmem-check-enabled
false
Ratio between virtual memory to physical memory when
setting memory limits for containers. Container allocations are
expressed in terms of physical memory, and virtual memory usage is
allowed to exceed this allocation by this ratio.
yarn.nodemanager.vmem-pmem-ratio
2.1