文章转自: https://blog.csdn.net/lizongti/article/details/102756472
目录
环境准备
依赖
安装Docker
单例模式(Without Docker)
安装
安装JDK
安装Hadoop
配置
环境变量
设置免密登录
修改 hadoop-env.sh
HDFS
创建目录
修改core-site.xml
修改hdfs-site.xml
格式化HDFS
启动HDFS
HDFS Web
HDFS 测试
YARN
修改mapred-site.xml
修改yarn-site.xml
启动Yarn
Yarn Web
Yarn 测试
集群搭建(Without Docker)
准备
配置master
配置单例
修改hdfs-site.xml
修改masters
删除slaves
拷贝到slaves节点
修改slaves
HDFS
清空目录
格式化HDFS
启动HDFS
测试HDFS
YARN
启动YARN
测试YARN
CentOS7.6
参照安装(点击)
去官网上下载1.8版本的tar.gz ,如果使用yum安装或者下载rpm包安装,则会缺少Scala2.11需要的部分文件。
tar xf jdk-8u221-linux-x64.tar -C /usr/lib/jvm
rm -rf /usr/bin/java
ln -s /usr/lib/jvm/jdk1.8.0_221/bin/java /usr/bin/java
编辑文件
vim /etc/profile.d/java.sh
添加
export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_221
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=${JAVA_HOME}/lib:${JRE_HOME}/lib:$CLASSPATH
export PATH=${JAVA_HOME}/bin:$PATH
然后使环境变量生效
source /etc/profile
执行以下命令检查环境变量
[root@vm1 bin]# echo $JAVA_HOME
/usr/lib/jvm/jdk1.8.0_221
[root@vm1 bin]# echo $JAVA_HOME
/usr/lib/jvm/jdk1.8.0_221
为了和另一篇的Spark达到版本兼容,使用官网hadoop2.7版本
wget https://archive.apache.org/dist/hadoop/common/hadoop-2.7.7/hadoop-2.7.7.tar.gz
解压
tar xf hadoop-2.7.7.tar.gz -C /opt/
编辑文件
vim /etc/profile.d/hadoop.sh
添加
export HADOOP_HOME=/opt/hadoop-2.7.7
export PATH=$PATH:$HADOOP_HOME/bin
然后使环境变量生效
source /etc/profile
本机也需要配置免密登录
参照这里
配置启动脚本内的JAVA_HOME
vi /opt/hadoop-2.7.7/etc/hadoop/hadoop-env.sh
使用
export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_221
替换
export JAVA_HOME=${JAVA_HOME}
mkdir -p /opt/hadoop-2.7.7/hdfs/name
mkdir -p /opt/hadoop-2.7.7/hdfs/data
配置访问节点
vi /opt/hadoop-2.7.7/etc/hadoop/core-site.xml
替换
为以下配置
hadoop.tmp.dir
file:/opt/hadoop-2.7.7/tmp
fs.defaultFS
hdfs://vm1:9000
配置副本个数
vi /opt/hadoop-2.7.7/etc/hadoop/hdfs-site.xml
替换
为以下配置
dfs.replication
1
dfs.name.dir
/opt/hadoop-2.7.7/hdfs/name
dfs.data.dir
/opt/hadoop-2.7.7/hdfs/data
cd /opt/hadoop-2.7.7/bin
hdfs namenode -format
cd /opt/hadoop-2.7.7/sbin
./start-dfs.sh
运行结果
Starting namenodes on [vm1]
vm1: starting namenode, logging to /opt/hadoop-2.7.7/logs/hadoop-root-namenode-vm1.out
localhost: starting datanode, logging to /opt/hadoop-2.7.7/logs/hadoop-root-datanode-vm1.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /opt/hadoop-2.7.7/logs/hadoop-root-secondarynamenode-vm1.out
访问HDFS web界面 http://vm1:50070
生成测试数据
mkdir -p /tmp/input
vi /tmp/input/1
加入
a
b
a
hadoop fs -mkdir -p /tmp/input
hadoop fs -put /tmp/input/1 /tmp/input
hadoop fs -ls /tmp/input
Found 1 items
-rw-r--r-- 1 root supergroup 6 2019-10-28 11:34 /tmp/input/1
设置调度器为yarn
cp /opt/hadoop-2.7.7/etc/hadoop/mapred-site.xml.template /opt/hadoop-2.7.7/etc/hadoop/mapred-site.xml
vi /opt/hadoop-2.7.7/etc/hadoop/mapred-site.xml
替换
为以下配置
mapreduce.framework.name
yarn
mapred.job.tracker
http://vm1:9001
vi /opt/hadoop-2.7.7/etc/hadoop/yarn-site.xml
替换
为以下配置
yarn.nodemanager.aux-services
mapreduce_shuffle
yarn.resourcemanager.hostname
vm1
./start-yarn.sh
显示
starting yarn daemons
starting resourcemanager, logging to /opt/hadoop-2.7.7/logs/yarn-root-resourcemanager-vm1.out
localhost: starting nodemanager, logging to /opt/hadoop-2.7.7/logs/yarn-root-nodemanager-vm1.out
访问http://vm1:8088
执行命令
```shell
hadoop jar /opt/hadoop-2.7.7/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar wordcount /tmp/input /tmp/result
执行日志
19/10/28 11:34:52 INFO client.RMProxy: Connecting to ResourceManager at vm1/192.168.1.101:8032
19/10/28 11:34:53 INFO input.FileInputFormat: Total input paths to process : 1
19/10/28 11:34:53 INFO mapreduce.JobSubmitter: number of splits:1
19/10/28 11:34:54 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1572232474055_0002
19/10/28 11:34:54 INFO impl.YarnClientImpl: Submitted application application_1572232474055_0002
19/10/28 11:34:54 INFO mapreduce.Job: The url to track the job: http://vm1:8088/proxy/application_1572232474055_0002/
19/10/28 11:34:54 INFO mapreduce.Job: Running job: job_1572232474055_0002
19/10/28 11:35:06 INFO mapreduce.Job: Job job_1572232474055_0002 running in uber mode : false
19/10/28 11:35:06 INFO mapreduce.Job: map 0% reduce 0%
19/10/28 11:35:11 INFO mapreduce.Job: map 100% reduce 0%
19/10/28 11:35:16 INFO mapreduce.Job: map 100% reduce 100%
19/10/28 11:35:17 INFO mapreduce.Job: Job job_1572232474055_0002 completed successfully
19/10/28 11:35:18 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=22
FILE: Number of bytes written=245617
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=98
HDFS: Number of bytes written=8
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=2576
Total time spent by all reduces in occupied slots (ms)=3148
Total time spent by all map tasks (ms)=2576
Total time spent by all reduce tasks (ms)=3148
Total vcore-milliseconds taken by all map tasks=2576
Total vcore-milliseconds taken by all reduce tasks=3148
Total megabyte-milliseconds taken by all map tasks=2637824
Total megabyte-milliseconds taken by all reduce tasks=3223552
Map-Reduce Framework
Map input records=3
Map output records=3
Map output bytes=18
Map output materialized bytes=22
Input split bytes=92
Combine input records=3
Combine output records=2
Reduce input groups=2
Reduce shuffle bytes=22
Reduce input records=2
Reduce output records=2
Spilled Records=4
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=425
CPU time spent (ms)=1400
Physical memory (bytes) snapshot=432537600
Virtual memory (bytes) snapshot=4235526144
Total committed heap usage (bytes)=304087040
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=6
File Output Format Counters
Bytes Written=8
查看运行结果
hadoop fs -cat /tmp/result/part-r-00000
显示
a 2
b 1
先在vm1上执行与单例配置完全一样的配置过程
vi /opt/hadoop-2.7.7/etc/hadoop/hdfs-site.xml
替换
dfs.replication
1
为以下配置
dfs.replication
2
这里的副本数dfs.replication配置成2
echo "vm1" > /opt/hadoop-2.7.7/etc/hadoop/masters
rm /opt/hadoop-2.7.7/etc/hadoop/slaves
scp -r /opt/hadoop-2.7.7 root@vm2:/opt/
scp -r /opt/hadoop-2.7.7 root@vm3:/opt/
cat > /opt/hadoop-2.7.7/etc/hadoop/slaves <
把vm2和vm3写入到slaves里面去
rm -rf /opt/hadoop-2.7.7/hdfs/data/*
rm -rf /opt/hadoop-2.7.7/hdfs/name/*
重新格式化dfs
hadoop namenode -format
/opt/hadoop-2.7.7/sbin
./start-dfs.sh
显示
Starting namenodes on [vm1]
vm1: starting namenode, logging to /opt/hadoop-2.7.7/logs/hadoop-root-namenode-vm1.out
vm3: starting datanode, logging to /opt/hadoop-2.7.7/logs/hadoop-root-datanode-vm3.out
vm2: starting datanode, logging to /opt/hadoop-2.7.7/logs/hadoop-root-datanode-vm2.out
vm1: starting datanode, logging to /opt/hadoop-2.7.7/logs/hadoop-root-datanode-vm1.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /opt/hadoop-2.7.7/logs/hadoop-root-secondarynamenode-vm1.out
比如namenode的日志就在/opt/hadoop-2.7.7/logs/hadoop-root-namenode-vm1.log中
检查master进程
$ jps
75991 DataNode
76408 Jps
76270 SecondaryNameNode
检查slave进程
$ jps
29379 DataNode
29494 Jps
查看集群状态
hdfs dfsadmin -report -safemode
显示
[root@vm1 sbin]#
Configured Capacity: 160982630400 (149.93 GB)
Present Capacity: 101929107456 (94.93 GB)
DFS Remaining: 101929095168 (94.93 GB)
DFS Used: 12288 (12 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
-------------------------------------------------
Live datanodes (3):
Name: 192.168.1.103:50010 (vm3)
Hostname: vm3
Decommission Status : Normal
Configured Capacity: 53660876800 (49.98 GB)
DFS Used: 4096 (4 KB)
Non DFS Used: 10321448960 (9.61 GB)
DFS Remaining: 43339423744 (40.36 GB)
DFS Used%: 0.00%
DFS Remaining%: 80.77%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Mon Oct 28 13:42:20 CST 2019
Name: 192.168.1.102:50010 (vm2)
Hostname: vm2
Decommission Status : Normal
Configured Capacity: 53660876800 (49.98 GB)
DFS Used: 4096 (4 KB)
Non DFS Used: 13661077504 (12.72 GB)
DFS Remaining: 39999795200 (37.25 GB)
DFS Used%: 0.00%
DFS Remaining%: 74.54%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Mon Oct 28 13:42:21 CST 2019
Name: 192.168.1.101:50010 (vm1)
Hostname: vm1
Decommission Status : Normal
Configured Capacity: 53660876800 (49.98 GB)
DFS Used: 4096 (4 KB)
Non DFS Used: 35070996480 (32.66 GB)
DFS Remaining: 18589876224 (17.31 GB)
DFS Used%: 0.00%
DFS Remaining%: 34.64%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Mon Oct 28 13:42:21 CST 2019
hadoop fs -mkdir -p /tmp/input
hadoop fs -put /tmp/input/1 /tmp/input
hadoop fs -ls /tmp/input
Found 1 items
-rw-r--r-- 1 root supergroup 6 2019-10-28 11:34 /tmp/input/1
/opt/hadoop-2.7.7/sbin
./start-yarn.sh
显示
starting yarn daemons
starting resourcemanager, logging to /opt/hadoop-2.7.7/logs/yarn-root-resourcemanager-vm1.out
vm2: starting nodemanager, logging to /opt/hadoop-2.7.7/logs/yarn-root-nodemanager-vm2.out
vm3: starting nodemanager, logging to /opt/hadoop-2.7.7/logs/yarn-root-nodemanager-vm3.out
vm1: starting nodemanager, logging to /opt/hadoop-2.7.7/logs/yarn-root-nodemanager-vm1.out
比如resourcemanager的日志就在/opt/hadoop-2.7.7/logs/yarn-root-resourcemanager-vm1.log中
检查master进程
$ jps
100464 ResourceManager
101746 Jps
53786 QuorumPeerMain
检查slave进程
$jps
36893 NodeManager
37181 Jps
hadoop jar /opt/hadoop-2.7.7/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar wordcount /tmp/input /tmp/result
结果同上。
更多参考:
https://cloud.tencent.com/developer/article/1084166
https://www.jianshu.com/p/7ab2b6168cc9
https://www.cnblogs.com/onetwo/p/6419925.html
转载文章,未经验证,一经验证如有问题将进行不当更正。