我们先将虚拟机的网络统一设置成 100 mbps
100 mbps = 12.5 m/s
因为 1字节 = 8比特
[atguigu@hadoop102 ~]$ cd /opt/software/
[atguigu@hadoop102 software]$ python -m SimpleHTTPServer
Serving HTTP on 0.0.0.0 port 8000 ...
将当前目录内容暴漏出去,供外界下载使用
下载网址:hadoop102:8000
双击下载文件,可以看到下载速度会比 12.5m/s 稍微慢一点,在 10 m/s 左右
[atguigu@hadoop102 mapreduce]$ hadoop jar /opt/module/hadoop-3.1.3/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.1.3-tests.jar TestDFSIO -write -nrFiles 10 -fileSize 128MB
2021-02-09 10:43:16,853 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write
2021-02-09 10:43:16,854 INFO fs.TestDFSIO: Date & time: Tue Feb 09 10:43:16 CST 2021
2021-02-09 10:43:16,854 INFO fs.TestDFSIO: Number of files: 10
2021-02-09 10:43:16,854 INFO fs.TestDFSIO: Total MBytes processed: 1280
2021-02-09 10:43:16,854 INFO fs.TestDFSIO: Throughput mb/sec: 1.61
2021-02-09 10:43:16,854 INFO fs.TestDFSIO: Average IO rate mb/sec: 1.9
2021-02-09 10:43:16,854 INFO fs.TestDFSIO: IO rate std deviation: 0.76
2021-02-09 10:43:16,854 INFO fs.TestDFSIO: Test exec time sec: 133.05
2021-02-09 10:43:16,854 INFO fs.TestDFSIO:
[atguigu@hadoop102 mapreduce]$ hadoop jar /opt/module/hadoop-3.1.3/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-3.1.3-tests.jar TestDFSIO -read -nrFiles 10 -fileSize 128MB
-read 测试读性能
2022-10-30 19:22:01,467 INFO fs.TestDFSIO: ----- TestDFSIO ----- : read
2022-10-30 19:22:01,467 INFO fs.TestDFSIO: Date & time: Sun Oct 30 19:22:01 CST 2022
2022-10-30 19:22:01,467 INFO fs.TestDFSIO: Number of files: 10
2022-10-30 19:22:01,467 INFO fs.TestDFSIO: Total MBytes processed: 1280
2022-10-30 19:22:01,467 INFO fs.TestDFSIO: Throughput mb/sec: 1921.92
2022-10-30 19:22:01,467 INFO fs.TestDFSIO: Average IO rate mb/sec: 1956.22
2022-10-30 19:22:01,467 INFO fs.TestDFSIO: IO rate std deviation: 266.42
2022-10-30 19:22:01,467 INFO fs.TestDFSIO: Test exec time sec: 22.07
2022-10-30 19:22:01,468 INFO fs.TestDFSIO:
压测后的速度:Throughput 1921.92 mb/sec
hdfs-site.xml
<property>
<name>dfs.namenode.name.dirname>
<value>file://${hadoop.tmp.dir}/dfs/name1,file://${hadoop.tmp.dir}/dfs/name2value>
property>
core-default.xml
<property>
<name>hadoop.tmp.dirname>
<value>/tmp/hadoop-${user.name}value>
<description>A base for other temporary directories.description>
property>
hadoop.tmp.dir 默认是存放在linux系统tmp目录下(默 保存一个月)
core-site.xml
<property>
<name>hadoop.tmp.dirname>
<value>/opt/module/hadoop-3.1.3/datavalue>
property>
之前安装Hadoop时,我们修改了临时文件存放路径
注意:如果每台服务器节点的磁盘情况不同,配置配完之后,可以选择不分发
[atguigu@hadoop102 hadoop]$ myhadoop.sh stop
[atguigu@hadoop102 hadoop-3.1.3]$ rm -rf data/ logs/
[atguigu@hadoop103 hadoop-3.1.3]$ rm -rf data/ logs/
[atguigu@hadoop104 hadoop-3.1.3]$ rm -rf data/ logs/
[atguigu@hadoop102 hadoop-3.1.3]$ bin/hdfs namenode -format
[atguigu@hadoop102 hadoop]$ myhadoop.sh start
[atguigu@hadoop102 dfs]$ cd /opt/module/hadoop-3.1.3/data/dfs
[atguigu@hadoop102 dfs]$ ll
总用量 12
drwx------ 3 atguigu atguigu 4096 10月 31 11:22 data
drwxrwxr-x 3 atguigu atguigu 4096 10月 31 11:22 name1
drwxrwxr-x 3 atguigu atguigu 4096 10月 31 11:22 name2
[atguigu@hadoop102 dfs]$ ll name1/current/
总用量 1052
-rw-rw-r-- 1 atguigu atguigu 690 10月 31 11:23 edits_0000000000000000001-0000000000000000009
-rw-rw-r-- 1 atguigu atguigu 1048576 10月 31 11:23 edits_inprogress_0000000000000000010
-rw-rw-r-- 1 atguigu atguigu 394 10月 31 11:21 fsimage_0000000000000000000
-rw-rw-r-- 1 atguigu atguigu 62 10月 31 11:21 fsimage_0000000000000000000.md5
-rw-rw-r-- 1 atguigu atguigu 804 10月 31 11:23 fsimage_0000000000000000009
-rw-rw-r-- 1 atguigu atguigu 62 10月 31 11:23 fsimage_0000000000000000009.md5
-rw-rw-r-- 1 atguigu atguigu 3 10月 31 11:23 seen_txid
-rw-rw-r-- 1 atguigu atguigu 216 10月 31 11:21 VERSION
[atguigu@hadoop102 dfs]$ ll name2/current/
总用量 1052
-rw-rw-r-- 1 atguigu atguigu 690 10月 31 11:23 edits_0000000000000000001-0000000000000000009
-rw-rw-r-- 1 atguigu atguigu 1048576 10月 31 11:23 edits_inprogress_0000000000000000010
-rw-rw-r-- 1 atguigu atguigu 394 10月 31 11:21 fsimage_0000000000000000000
-rw-rw-r-- 1 atguigu atguigu 62 10月 31 11:21 fsimage_0000000000000000000.md5
-rw-rw-r-- 1 atguigu atguigu 804 10月 31 11:23 fsimage_0000000000000000009
-rw-rw-r-- 1 atguigu atguigu 62 10月 31 11:23 fsimage_0000000000000000009.md5
-rw-rw-r-- 1 atguigu atguigu 3 10月 31 11:23 seen_txid
-rw-rw-r-- 1 atguigu atguigu 216 10月 31 11:21 VERSION
检查name1和name2里面的内容,发现一模一样
hdfs-site.xml
<property>
<name>dfs.datanode.data.dirname>
<value>file://${hadoop.tmp.dir}/dfs/data1,file://${hadoop.tmp.dir}/dfs/data2value>
property>
core-site.xml
<property>
<name>hadoop.tmp.dirname>
<value>/opt/module/hadoop-3.1.3/datavalue>
property>
之前安装Hadoop时,我们修改了临时文件存放路径
core-default.xml
<property>
<name>hadoop.tmp.dirname>
<value>/tmp/hadoop-${user.name}value>
<description>A base for other temporary directories.description>
property>
hadoop.tmp.dir 默认是存放在linux系统tmp目录下(默认 保存一个月)
注意:如果每台服务器节点的磁盘情况不同,配置配完之后,一般不分发
停止集群
[atguigu@hadoop102 hadoop]$ myhadoop.sh stop
启动集群
[atguigu@hadoop102 hadoop]$ myhadoop.sh start
[atguigu@hadoop102 dfs]$ cd /opt/module/hadoop-3.1.3/data/dfs
[atguigu@hadoop102 dfs]$ ll
总用量 20
drwx------ 3 atguigu atguigu 4096 10月 31 13:27 data
drwx------ 3 atguigu atguigu 4096 10月 31 13:27 data1
drwx------ 3 atguigu atguigu 4096 10月 31 13:27 data2
drwxrwxr-x 3 atguigu atguigu 4096 10月 31 13:27 name1
drwxrwxr-x 3 atguigu atguigu 4096 10月 31 13:27 name2
多出来data1 、 data2两个目录
向集群提交文件
hadoop fs -put wcinput/word.txt /
会发现两个文件夹下数据内容不一致,data1有数据,data2没有,不均衡,下节会讲如何处理数据均衡
[atguigu@hadoop102 dfs]$ ll data1/current/BP-97718552-192.168.10.102-1667186515016/current/finalized/subdir0/subdir0/
总用量 8
-rw-rw-r-- 1 atguigu atguigu 45 10月 31 13:30 blk_1073741825
-rw-rw-r-- 1 atguigu atguigu 11 10月 31 13:30 blk_1073741825_1001.meta
[atguigu@hadoop102 dfs]$ ll data2/current/BP-97718552-192.168.10.102-1667186515016/current/finalized/
总用量 0
(1)生成均衡计划
hdfs diskbalancer -plan hadoop102
(2)执行均衡计划
hdfs diskbalancer -execute hadoop102.plan.json
(3)查看当前均衡任务的执行情况
hdfs diskbalancer -query hadoop102
(4)取消均衡任务
hdfs diskbalancer -cancel hadoop102.plan.json
[atguigu@hadoop102 ~]$ hdfs diskbalancer -plan hadoop102
2022-10-31 17:25:15,422 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2022-10-31 17:25:15,576 INFO planner.GreedyPlanner: Starting plan for Node : hadoop102:9867
2022-10-31 17:25:15,576 INFO planner.GreedyPlanner: Compute Plan for Node : hadoop102:9867 took 9 ms
2022-10-31 17:25:15,576 INFO command.Command: No plan generated. DiskBalancing not needed for node: hadoop102 threshold used: 10.0
No plan generated. DiskBalancing not needed for node: hadoop102 threshold used: 10.0
我只有一块磁盘,不会生成计划